A sonar image classification method combining SLIC superpixels and graph attention networks

By combining SLIC superpixels and graph attention networks for sonar image classification, the problems of low resolution and severe noise in sonar images are solved, achieving high-precision sonar image classification and improving the underwater target recognition effect.

CN116468995BActive Publication Date: 2026-06-23RES & DEV INST OF NORTHWESTERN POLYTECHNICAL UNIV IN SHENZHEN +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
RES & DEV INST OF NORTHWESTERN POLYTECHNICAL UNIV IN SHENZHEN
Filing Date
2022-07-21
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

Sonar images suffer from problems such as low resolution, severe noise, and blurred target edges in underwater target classification and recognition, resulting in poor classification and recognition performance.

Method used

A combined SLIC superpixel and graph attention network approach is adopted to extract pixel features and spatial location features of sonar images by combining sonar image pre-segmentation, SLIC superpixel clustering and graph attention network, construct graph structure data and perform classification.

Benefits of technology

It improves the recognition accuracy and performance of sonar images, effectively utilizes the spatial relationship between the sound shadow zone and the target area, and achieves high-precision sonar image classification.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116468995B_ABST
    Figure CN116468995B_ABST
Patent Text Reader

Abstract

The application discloses a kind of sonar image classification methods of combined SLIC superpixel and graph attention network.It includes the following steps: according to the imaging principle of two-dimensional forward-looking sonar and side scan sonar respectively and prior information when imaging: the sonar image after correction is based on the image pre-segmentation of improved DeepLabV3+ network, SLIC superpixel algorithm is used to construct Graph (graph) structure data: construct the sonar image classification model based on GAT (graph attention network), the sonar graph structure data built is sent into network to complete the training and test of model;Verify the importance of pixel feature and spatial position feature.The application discloses a kind of sonar image classification methods of combined SLIC superpixel and graph attention network, which fully utilizes the space position relationship of acoustic shadow area, target area and shadow area by SLIC superpixel method and graph attention network, so as to realize higher accuracy of sonar image classification identification of sonar image by combining pixel feature and spatial geometric feature.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of underwater classification and recognition, specifically relating to a sonar image classification method that combines SLIC superpixels and graph attention networks. Background Technology

[0002] Sonar is a crucial underwater precision detection method, primarily used for underwater target detection and identification, marine mapping, underwater acoustic communication, and offshore operations. Underwater target classification and identification is a particularly critical technology, applicable to mine detection and clearance, underwater salvage and search and rescue, autonomous obstacle avoidance for unmanned platforms, and detection of subsea pipelines and cracks. Underwater target classification, analyzed and identified from an image processing perspective, offers greater intuitiveness; therefore, this invention focuses on underwater target classification and identification based on sonar images.

[0003] Sonar echo imaging is significantly affected by the marine environment and seabed topography; the lower the environmental noise and the flatter the seabed, the higher the image quality. The resulting sonar image consists of three main parts: the target region, the sound shadow region, and the reverberant background. The target region is the echo generated by strong underwater reflectors; the sound shadow region is the area where sound waves cannot reach due to obstruction by the target; and the background is seabed noise and reverberation. Compared to optical images, sonar images suffer from low resolution, severe noise, and blurred target edges, leading to poor classification and recognition performance. To address these issues, graph neural networks based on deep learning combine pixel features and spatial location features to extract richer features, making them highly suitable for sonar image classification and recognition. Graph attention networks, in particular, introduce an attention mechanism, aggregating neighbor information to learn neighborhood and spatial features. Furthermore, the model exhibits greater flexibility and robustness to specific inputs, making it highly effective as the primary model for sonar image classification in improving classification accuracy.

[0004] The shadow regions in sonar images contain information such as the shape and height of the sonar target. Therefore, joint feature extraction of the target region and the shadow region can yield more effective target information. The Simple Linear Iterative Clustering (SLIC) algorithm, by clustering pixels, not only achieves information aggregation and removal of redundant information, but also more accurately determines the location information of bright and shadow regions of the sonar target. Furthermore, during graph convolution operations, superpixels can acquire more global knowledge, expanding the receptive field of the convolution operation, thus achieving better efficiency and performance in sonar image recognition.

[0005] Based on the above considerations, this paper focuses on sonar image classification and proposes a sonar image classification method that combines SLIC superpixels and graph attention networks. According to the imaging principle of sonar, the sonar image is pre-segmented into shadow and bright areas. Then, the SLIC superpixel clustering algorithm is used to convert the segmentation results into graph structure data. Sonar features are extracted from multiple perspectives, including pixel features and spatial geometric features. Finally, a graph attention network is used to achieve sonar image classification and recognition. Summary of the Invention

[0006] To address the aforementioned technical problems, this invention discloses a sonar image classification method that combines SLIC superpixels and graph attention networks.

[0007] The present invention aims to provide a sonar image classification method that combines SLIC superpixels and graph attention networks. The steps of the method are as follows:

[0008] S1: Based on the imaging principles of two-dimensional forward-looking sonar and side-scan sonar, as well as the prior information during imaging, different preprocessing methods are used to achieve autonomous correction and compensation of sonar images.

[0009] S2: Perform image pre-segmentation on the corrected sonar image based on the improved DeepLabV3+ network to achieve synchronous segmentation of the sonar target highlight area and sound shadow area.

[0010] S3: Use the SLIC superpixel algorithm to construct graph structure data, and combine pixel features and spatial location features to form the final graph attributes.

[0011] S4: Construct a sonar image classification model based on GAT (Graph Attention Network), and feed the constructed sonar graph structure data into the network to complete the training and testing of the model.

[0012] S5: Ablation experiment setup to verify the importance of pixel features and spatial location features, the effectiveness of sonar image pre-segmentation, and the importance of sonar target shadow region information.

[0013] Furthermore, step S1 includes the following steps:

[0014] S11: Forward-looking sonar image reconstruction technology and enhancement algorithm:

[0015] Furthermore, step S11 includes the following steps:

[0016] S111: Forward-looking sonar image reconstruction technology:

[0017] Forward-looking sonar images can be represented in two ways: one is in polar coordinates, which is the original data format and is presented as a pie chart with (r, θ) as the coordinate axes; the other is in the conventional image coordinate system (x, y) obtained after coordinate transformation. The conversion formula between the two coordinate systems is as follows:

[0018]

[0019] Where φ and R represent the horizontal opening angle and slant range of the forward-looking sonar, respectively, and W and H represent the horizontal and vertical dimensions of the image, respectively.

[0020] S112: Forward-looking sonar image enhancement:

[0021] To address the issue of large-area high-frequency noise characteristics, the following preprocessing steps are specifically employed for forward-looking sonar images:

[0022] (1) Coordinate transformation: transform the sonar image in the sector polar coordinate system to the two-dimensional conventional coordinate system.

[0023] (2) Median filtering is performed to suppress noise and better protect the target from gray value abrupt changes in the shadow area.

[0024] (3) Perform histogram equalization, which helps with image display and intuitive solution of sonar images.

[0025] (4) Pseudo-color processing is used to convert grayscale images into color images to improve the recognizability of forward-looking sonar image content.

[0026] S12: Side-scan sonar image grayscale correction algorithm and resolution correction (geometric correction) algorithm:

[0027] Furthermore, step S12 includes the following steps:

[0028] S121: Gray-scale correction of side-scan sonar images:

[0029] Before performing grayscale correction, the position of the seabed line in the sonar image needs to be obtained. The position of the seabed line in the image is related to the height of the towed fish; therefore, the position point of the seabed line can be obtained from the pre-acquired height information using the following conversion:

[0030] line orig =N s -(altitude*N s / range) (2)

[0031] In the formula, altitude represents altitude information, range represents the operating range of the sonar, and N sThis represents the number of sampling points for a specific sound intensity data point (ping(n)) acquired from a single side. Next, grayscale correction is performed on all pixels within the region width. First, the average grayscale value of each ping section is calculated along the image height direction:

[0032]

[0033] In the formula N min The grayscale mean along the width of the sonar image corresponding to the maximum height is then calculated, and finally, a sequence of grayscale correction factors for all pixels is obtained.

[0034]

[0035] S122: Side-scan sonar image resolution correction (geometric correction):

[0036] Based on the geometric relationship between slant range, horizontal distance, and depth, the pixel position correspondence between the sonar image composed of slant range points and the sonar image composed of horizontal distance points is as follows:

[0037] Port side resolution correction factor:

[0038] Starboard resolution correction factor: Where Res represents the image resolution, width represents the image width, PlantRange represents the horizontal distance, SlantRange represents the slant distance, and TowfishAlt represents the towfish height.

[0039] Furthermore, step S2 includes the following steps:

[0040] S21: Construct a sonar image pre-segmentation model based on an improved DeepLabV3+ network:

[0041] A sonar image pre-segmentation model based on an improved DeepLabV3+ network is constructed to pre-segment the calibrated image into bright and shadow regions. Since this step only requires pre-segmentation of the target bright and shadow regions in the sonar image, the network's feature extraction capability does not need to be too strong, but the algorithm's real-time performance is critical. Based on these considerations, the original DeepLabV3+ network model, which suffers from slow training speed, is replaced by the MobileNetV2 network with a smaller parameter set, serving as the backbone extraction network.

[0042] After completing the backbone network feature extraction, the obtained preliminary effective features are further enhanced. In the Encoder stage, multiple dilated convolutions with different dilation rates are used for parallel feature extraction, resulting in the following output feature y for the preliminary effective feature x at position i:

[0043]

[0044] In the formula, r represents the dilation rate, w represents the convolution kernel, and kernel-size represents the kernel size. In the Decoder stage, a 1x1 convolution is used to adjust the number of channels. The adjustment result is stacked with the feature results obtained in the Encoder stage, and finally, two depthwise separable convolutions are used to obtain the final feature extraction result.

[0045] After modifying the network structure, the loss function used during training was improved. To address the poor model training performance caused by the imbalance of sonar image samples, an improvement was made to the cross-entropy loss function, proposing the Focal loss function:

[0046] FL(p t )=-α t (1-p t ) λ log(p t (10)

[0047] In the formula p t This represents the predicted probability. In the multi-class classification task involved in this invention, the predicted probability is the probability output by SoftMax, α. t Let λ represent the weighting factor for each class, and λ represent the adjustment factor. To evaluate the quality of semantic segmentation results, the Diceloss function is introduced.

[0048] S22: Create a sonar image segmentation dataset and complete the training of the pre-segmentation model:

[0049] In the pre-segmentation step, only the bright and shadow regions of the target in the sonar image are pre-segmented, without distinguishing image categories. Therefore, only two types need to be labeled when annotating the sonar image: bright areas and shadows. Through experiments and online data collection, and using data augmentation techniques, a total of 695 sonar images were obtained. The obtained images were then segmented into a dataset: 488 images for training, 71 images for validation, and 136 images for testing. After organizing all the datasets, they were labeled separately according to bright and shadow areas, thus completing the preparation of all datasets. After the datasets were prepared, the model was trained. After modifying the corresponding parameters, the labeled dataset was fed into the network, finally completing the training of the sonar image pre-segmentation model.

[0050] S23: Use the trained model to perform real-time pre-segmentation of the sonar image to be segmented:

[0051] To address the issue of poor subsequent recognition results due to blurred target edges in actual sonar images, and to facilitate effective extraction of sound shadow information during the recognition process, the sonar image is pre-segmented into target bright areas and sound shadow areas. After training the pre-segmentation model, the weight path in the testing program is modified to the weight file with the best training results. The actual collected sonar images to be segmented are then fed into the program for real-time pre-segmentation of target bright areas and shadows. In the pre-segmentation results, the image contains only three different pixel values, representing: target echo area, sound shadow area, and reverberant background.

[0052] Furthermore, step S3 includes the following steps:

[0053] S31: Perform superpixel segmentation based on the SLIC algorithm on the preprocessed and pre-segmented sonar image:

[0054] After the above image preprocessing and pre-segmentation, problems such as blurred sonar target edges are improved, and accurate extraction of target shadow information is achieved. Next, the SLIC algorithm is used to convert pixel data into hundreds of superpixel blocks, and finally the superpixels are converted into graph structure data.

[0055] In the SLIC algorithm, each pixel has a 5-dimensional vector V[I,a,b,x,y]. T This indicates that [l, a, b] are involved. T Represents the pixel color feature coordinates in the CIE-LAB color space, [x, y]. T This represents the pixel space feature coordinates. The specific steps for applying the SLIC algorithm to preprocessed and pre-segmented sonar images are as follows:

[0056] (1) Initialize cluster centers:

[0057] The pre-segmented sonar image is divided into multiple superpixel blocks of uniform area. The number of superpixels to be generated is pre-set to M, and the initial cluster center of each superpixel is defined as C. i =[l i ,a i ,b i ,x i ,y i ] T (i = 1, ..., M), let them be uniformly distributed within the image. Assume the total number of pixels in the original image is N, and all superpixels have the same size. Then the number of pixels (superpixel area) contained in each superpixel is N / M, and the distance between adjacent superpixel cluster centers is approximately...

[0058] (2) Reselect the cluster center location:

[0059] The initial cluster center is defined as C i =[l i ,a i ,b i ,x i ,y i ] T (i = 1, ..., M), but due to the poor effect of the initial definition, the center point may fall on the contour boundary with a large gradient, which will affect the subsequent clustering effect. Therefore, it is necessary to reselect the optimal clustering center within the local neighborhood of the initial point in the range of n×n (generally n is 3).

[0060] (3) Initialize the pixels and assign an initial class label to each pixel:

[0061] After obtaining the cluster center locations of the entire sonar image, labels need to be assigned to each pixel within the neighborhood of each superpixel cluster center. In the SLIC algorithm, the superpixel area is approximately S×S, and its search range is limited to the vicinity of the cluster center; therefore, the search range is set to 2S×2S. Because SLIC restricts the size of the search area, it significantly reduces the number of distance calculations compared to the traditional k-means clustering algorithm.

[0062] (4) Distance similarity measurement:

[0063] After normalizing both pixel feature distance and spatial location feature distance, they are combined into a single metric, and the maximum feature distance N within each cluster is taken. s and N c The distance metric D between pixels and cluster centers is expressed as follows:

[0064]

[0065] In the formula

[0066] (5) Iterative optimization of clustering results:

[0067] After defining the distance similarity measurement rules between pixels and cluster centers, a local search is performed within the 2S×2S neighborhood of the superpixel center. The distance from each pixel within this neighborhood to the superpixel center is calculated according to the measurement rules. If the distance is less than the distance from the pixel to its original cluster center, the pixel is considered a current superpixel, and vice versa. After calculating and comparing all pixels, the distance and label arrays are updated, and the coordinates of the region center points are recalculated to update the superpixel cluster center positions.

[0068] S32: SLIC Superpixel Clustering Result Storage and File Generation:

[0069] After completing SLIC superpixel clustering, the results need to be stored in a standard format for subsequent graph structure data generation. The stored content includes four main parts: the label of each image, the sequence number of all superpixels in each image, the pixel value, and the coordinates of the center position. The image labels are divided into four types: drowning person, mine, airplane, and shipwreck. After collecting all datasets, they are divided into training, validation, and test sets. Then, the SLIC clustering results generated for each image in each dataset are written to their respective files according to the four features.

[0070] S33: Construct graph-structured data based on sonar images using the superpixel segmentation results:

[0071] Furthermore, step 33 includes the following steps:

[0072] S331: Graph structure data representation under sonar images:

[0073] A graph is a non-Euclidean data structure, denoted as G = {V, E}, where V = {v1, ..., v2}. M} represents the set of nodes; E = {e1, ..., e2} P} represents the set of edges. The graph is defined as the target category label information represented by the sonar image; each node in the graph is defined as the center of each superpixel obtained by SLIC clustering; the edge between nodes is defined as the connection relationship between two superpixels.

[0074] After defining nodes and edges, this information is not enough to fully construct the graph structure data. An adjacency matrix is ​​needed to define the relationships between all nodes. An adjacency matrix is ​​a two-dimensional array that reflects the degree of association between any two nodes. In an unweighted, undirected graph, the adjacency matrix is ​​defined as W∈R. N×N W i,j =w i,j Indicates from node v i to v j The weight of the edge is defined flexibly; in this paper, it is defined as the coefficient calculated jointly by the pixel feature distance and spatial location feature distance between superpixels, W. i,j =0 means v i and v j There is no edge between the two nodes.

[0075] S332: Definition of Graph Structure Attributes in Sonar Images:

[0076] Compared to deep learning methods in Euclidean space, graph neural networks based on non-Euclidean space introduce spatial location features when constructing graph structure data. This allows for highly effective extraction and utilization of shadow region information in sonar images, which contains important information such as the height and shape of the sonar target. This invention utilizes graph structure data to simultaneously consider the pixel and location features of both the bright and shadow regions of the sonar target, thereby achieving higher image recognition performance.

[0077] Furthermore, step 332 includes the following steps:

[0078] S3321: Definition of node attributes in a Graph structure:

[0079] In addition to their own numbering, nodes in the diagram can also contain many other types of attributes. In this invention, nodes are defined as superpixels, and node attributes are defined as two types of features: positional features and pixel features. Positional information refers to the center coordinates of each superpixel, and pixel information refers to the average pixel value of each superpixel. The nodes and their attributes are specifically represented as follows:

[0080]

[0081] S3322: Calculation of the adjacency matrix in a graph structure:

[0082] The adjacency matrix stores all pairwise connections between nodes. In this invention, the connection between each pairwise node considers both positional and pixel features; that is, the pixel difference and positional difference between the two nodes are added together to form the final matrix value. Since the position coordinates range from [0, 200) and the pixel size ranges from [0, 1], the position coordinates need to be normalized first. The specific calculation is as follows:

[0083]

[0084] S333: Definition of edge attributes in a Graph structure:

[0085] In addition to the edge index indicating connectivity, the edges in the graph can also contain many other types of attributes. In this invention, the attribute of an edge is defined as the superpixel distance calculated by combining pixel features and spatial location features. The edges and their attributes are specifically represented as follows:

[0086]

[0087] The attribute definitions for the entire graph structure data are summarized as follows:

[0088]

[0089] S333: Convert to DGL data:

[0090] The graph attention network developed in this invention is based on the DGL framework. Therefore, after constructing the graph structure data, it needs to be converted into standard DGLGraph data under the DGL framework. In the DGL framework, a node is represented by an integer, called the node ID; a pair of integers e is used... i (u,v) represents an edge, where u and v correspond to the node IDs of the start and end points of the edge, respectively. i This represents the edge ID. Both nodes and edges can contain features with custom names, accessed via the `ndata` and `edata` attributes, respectively. Therefore, in this invention, the superpixel number `sp-order` after SLIC clustering of each sonar image is used as the node ID, and the number pairs between connected superpixels are used as edge ID pairs. The pixel feature `f(x)` of the node... i ,y i ) and location features (x i ,y i The edge weight feature W is written into the ndata attribute. i,j Write it into the edata attribute to complete the full construction of the DGLGraph data.

[0091] Furthermore, step S4 includes the following steps:

[0092] S41: Construction of a graph attention network based on sonar data:

[0093] After converting the results of SLIC superpixel clustering into graph-structured data, they are fed into a graph attention network for model training and testing. The graph attention network model employs an attention module to embed nodes in the graph. By calculating the attention coefficients between the current node and its neighbors, it aggregates neighbor information and adaptively assigns weights to different neighbors, thereby learning neighborhood and spatial features. The most crucial aspect of this network is the construction of the graph attention layer.

[0094] The input to the graph attention layer is a combination of features from each node, represented as: Where M is the number of nodes, h i Let F be the feature vector of the i-th node, and F be the number of features in each node. After the input passes through the attention layer, the output yields a new combination of node features. The shared linear transformation between each node of the input and output is defined as a parameterized weight matrix. Because the graph attention layer incorporates an attention mechanism, assigning different coefficient weights to the current node and its neighboring nodes, the input and output of the entire graph attention layer can be represented as follows:

[0095]

[0096] In the formula α ij Let α represent the attention coefficient (weight) between node i and node j, Q be the weight matrix obtained through backpropagation, and σ be the nonlinear activation function. Determining the attention coefficient α is the core of the graph attention network. First, a self-attention mechanism α is implemented at each node: Thus, the attention coefficient e, which represents the importance of the feature of node j to node i, is calculated. ij :

[0097]

[0098] Next, a LeakyReLU function with a negative slope of 0.2 is used for non-linear processing, and SoftMax is introduced to regularize all attention coefficients. Simultaneously, by performing hidden attention, attention coefficients are calculated only for other nodes within a certain neighborhood of a node. Therefore, the coefficients of the complete attention mechanism are expressed as follows:

[0099]

[0100] In the formula, T represents transpose, and || represents concatenation operation.

[0101] In the above expression, two unknowns remain: the specific location of the selected neighborhood and the number of nodes k contained in the neighborhood. In this invention, the value of the number of nodes k in the neighborhood is consistent with the value of κ in the KNN algorithm; the specific k nearest neighbors are determined by the edge connectivity attributes obtained by the KNN algorithm. During the algorithm execution, a suitable value of k is selected, and the K nearest neighbor algorithm is executed to update the adjacency matrix and edge weights. After obtaining the adjacency matrix and edge connectivity, weakly connected edges on each node need to be deleted, leaving the k edges with the strongest connectivity, thus obtaining the updated adjacency matrix W and edge weights W. i,j Before performing SoftMax normalization, multiplying matrix W with the attention coefficient matrix determines the value of k and the specific edge that k represents, thus obtaining the final expression for the attention mechanism coefficients:

[0102]

[0103] After the transformations and calculations above, the attention cross-correlation coefficients between different nodes are obtained. These coefficients are then substituted into the calculation formula between the input and output of the graph attention layer to obtain the final output features of each node. This completes the construction of the graph attention layer. Finally, an appropriate number of attention layers is selected to fully construct the network model.

[0104] S42: Dataset collection, preparation, and construction:

[0105] After converting the constructed graph structure data into standard DGLGraph data under the DGL framework, the dataset needs to be divided into training, validation, and test sets. During the DGLGraph structure conversion process, the data is directly stored in the divided sets. There are four types of graph structure data: drowning victims, mines, aircraft, and shipwrecks. The training set contains 488 graph structures, including 98 drowning victims, 119 mine targets, 87 aircraft, and 184 shipwrecks. The validation set contains 71 graph structures, including 17 drowning victims, 18 mine targets, 13 aircraft, and 23 shipwrecks. The test set contains 136 graph structures, including 13 drowning victims, 35 mine targets, 23 aircraft, and 64 shipwrecks.

[0106] S43: Experimental setup and model training:

[0107] Parameter settings: This includes setting the number of graph attention layers, hidden units, output feature vector units, residual states, readout states, the number of independent attention mechanisms in multi-head attention, the number of random dropouts of input features, the total dropout value, the state of the batch-norm layer, and the state of self-loops. It also includes setting the network model, dataset, number of classes, random seed, number of epochs, batch size, initial learning rate, learning rate decay coefficient, the number of times the learning rate can tolerate no performance improvement, the lower bound of the learning rate, the weight decay coefficient, the epoch interval, and the maximum execution time.

[0108] Model Training: After modifying and setting all parameters, the graph structure data, created based on the SLIC superpixel clustering algorithm, is fed into the network for model training. First, the graph structure data is loaded, then the network model and forward propagation process are defined. GAT convolutions (graph attention layers) are used for feature extraction and message passing within the graph structure. Next, the loss function and optimizer are defined. The error between the predicted value and the ground truth label is calculated based on the defined loss function. The model parameters in the forward propagation are updated using the optimization function to reduce the error between the predicted value and the ground truth label. At the end of the iteration, the model parameters corresponding to the minimum loss value are replaced in the forward propagation to identify the category of the sonar image to be classified.

[0109] S44: Model Testing and Result Analysis

[0110] After the GAT model based on sonar images is trained, the weight parameters, which have stabilized after training, are selected to classify sonar image categories on the test set. Finally, the recognition accuracy on the test set is used to evaluate the model. Specifically, in this step, the network model needs to be switched to test mode, Batch Normalization and Dropout are turned off, and the model is tested using evaluate-network.

[0111] Furthermore, step S5 includes the following steps:

[0112] S51: Verify the effectiveness of image pre-segmentation and sonar target shadow region information:

[0113] To verify the effectiveness of image pre-segmentation and sonar target shadow region information, three types of datasets were created. The first type consisted of raw sonar images without image pre-segmentation or extraction of target shadow region information. The raw images were directly converted into graph-structured data using the SLIC superpixel segmentation algorithm. This constructed graph-structured dataset was then fed into the GAT network for model training. The curve of the training loss function changing with the number of iterations was obtained. After training, sonar images were classified using data from the test set, and the recognition rate on the test set was calculated to obtain the optimal recognition performance.

[0114] The second category involves pre-segmenting the raw sonar images without extracting acoustic shadowing information. Specifically, the raw sonar images are first pre-processed and pre-segmented, and then the SLIC clustering algorithm is applied to the pre-segmented images to obtain complete graph structure data, thereby completing the construction of the dataset required for the GAT model. Similarly, the model is subsequently trained and tested.

[0115] The third type involves raw sonar images that require both image pre-segmentation and extraction of sonar shadow region information. After pre-segmentation, bright and shadow areas of the target are labeled with different colors. Then, the SLIC clustering algorithm is used to store target edge information and the correlation between bright and shadow areas based on pixel features and spatial location features, forming a graph-structured dataset with richer attributes. Finally, this dataset is also fed into the network for model training and testing. The training curves and test results obtained from the above three types of datasets are compared pairwise to verify the effectiveness of image pre-segmentation and sonar target shadow region information extraction.

[0116] S52: Test the impact of the relative weight γ between pixel features and spatial location features on recognition performance:

[0117] Since the weight factor γ takes the value [0,1], a larger weight indicates a greater proportion of spatial location features, while a smaller weight indicates a greater proportion of pixel features. In order to explore the most suitable weight allocation between spatial location features and pixel features, that is, the optimal relative weight γ value, this invention sets γ to different values ​​[0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1] to test the influence of the relative weight γ between pixel features and spatial location features on the recognition effect.

[0118] S53: Test the impact of different attribute calculation methods on recognition performance:

[0119] Different attribute calculation methods result in different ranges of calculated attribute values. Although normalization can be performed subsequently, the calculated edge weights for the same pixel value and spatial location are inconsistent, and even after normalization, the distribution of values ​​remains different. For example, using the Sigmoid function, the values ​​vary greatly within ±5, while the calculated weight values ​​outside this range show almost no change. Alternatively, using the negative exponent of e, the values ​​vary significantly within the [0,1] range, while changes outside this range are slow. Therefore, to find the most suitable attribute calculation method, this project selected four functions for calculating attributes: the Sigmoid function, the symmetrical Sigmoid function, the negative exponent of e, and direct addition. The recognition rate under each function was calculated based on the function expression, and the recognition results calculated under various methods were compared to select the optimal attribute calculation method. The four attribute calculation methods are shown in the following formulas:

[0120]

[0121] S54: The impact of different values ​​of the number of nodes k contained in the neighborhood of a test node on the recognition performance:

[0122] Different values ​​of k directly affect the calculation of the attention mechanism coefficients in graph attention networks. If the value of k is too small, it will lead to a huge loss of information and too little aggregated neighborhood information, making it difficult to leverage the performance of graph attention networks based on neighborhood information. If the value of k exceeds a certain threshold, it may introduce too much noise, leading to a decrease in model performance. Therefore, it is necessary to find an optimal value of k to achieve the best model performance. In this invention, k is set to [5, 10, 20, 30, 50, 100], and the training effect and recognition accuracy of the model under different k values ​​are compared, thereby completing the test of the impact of different values ​​of the number of nodes k contained in the neighborhood of a node on the recognition effect.

[0123] This invention presents a sonar image classification method combining SLIC superpixels and graph attention networks. By fully utilizing the sound shadow region and the spatial relationship between the target area and the shadow area through SLIC superpixels and graph attention networks, a higher-precision sonar image classification and recognition method is achieved by combining pixel features and spatial geometric features. Specifically, firstly, based on the sonar imaging principle and prior information during imaging, the sonar image undergoes autonomous correction and compensation. Then, the corrected sonar image is pre-segmented using an improved DeepLabV3+ model to segment the bright and shadow areas of the sonar target. After this series of preprocessing steps, the image information is converted into graph-structured data using SLIC superpixel clustering, and pixel features and spatial location features are considered together to form corresponding graph attributes. Finally, the graph attention network is used to classify the sonar image using the formed graph-structured data. Because the graph network considers both sound shadow region information and the spatial location features of the sonar target, the recognition performance of the sonar image is significantly improved.

[0124] This method targets 2D forward-looking sonar and side-scan sonar. It fully utilizes the SLIC superpixel method and graph attention network to leverage sonar target acoustic shadow region information and the spatial relationship between target and shadow regions, combining pixel features and spatial geometric features to achieve high-precision sonar image classification and recognition. First, based on the imaging principles of 2D sonar images and prior information during imaging, different preprocessing methods are used to achieve autonomous correction and compensation of the sonar images. Then, the corrected sonar images undergo image pre-segmentation based on an improved DeepLabV3+ model to segment the bright and shadow regions of the sonar target, thereby improving issues such as blurred target edges and achieving accurate extraction of target shadow information. After preprocessing and pre-segmentation, the SLIC superpixel clustering method is used to convert image information into graph-structured data, and pixel features and spatial geometric features are considered together to form corresponding graph attributes. Finally, the graph attention network is used to aggregate neighborhood information and adaptively assign weights to different neighbors, achieving the goal of learning neighborhood and spatial features, ultimately realizing graph-based sonar image classification. The graph neural network proposed in this invention, based on non-Euclidean space, simultaneously considers the acoustic shadowing information of sonar targets as well as the spatial positional correlation between bright areas and shadows, thereby significantly improving the recognition performance of sonar images. Furthermore, apart from the different preprocessing processes, subsequent clustering and recognition can be applied to multiple image sonars simultaneously, thus effectively improving the versatility of the model. Attached Figure Description

[0125] Figure 1 This is a flowchart of the present invention;

[0126] Figure 2 This is a schematic diagram showing the result of preprocessing and pre-segmenting the original sonar image of the present invention.

[0127] Figure 3 This is a schematic diagram of the superpixel clustering results generated by the SLIC algorithm for four types of sonar targets: drowning victims, mines, aircraft, and sunken ships, according to the present invention.

[0128] Figure 4 A schematic diagram of the DGLGraph data structure generated from a sonar image of this invention;

[0129] Figure 5 The diagram shows the loss functions of the training and validation sets and the curves showing the change in recognition rate with the number of iterations during the training process of this invention.

[0130] Figure 6 The curves showing the change in recognition accuracy of the test set of this invention with the number of iterations and the schematic diagram of the optimal recognition result are shown.

[0131] Figure 7 This is a schematic diagram showing the comparison results of the effectiveness of image pre-segmentation and sonar target shadow region information in the present invention.

[0132] Figure 8 This is a schematic diagram comparing the recognition performance of the present invention under different values ​​of the relative weights between pixel features and spatial location features.

[0133] Figure 9 This is a schematic diagram comparing the recognition effects under different attribute calculation methods of the present invention;

[0134] Figure 10 This is a schematic diagram comparing the recognition performance of the present invention under different values ​​of the number of nodes k contained in the neighborhood of a node. Detailed Implementation

[0135] The embodiments of the present invention will now be described in detail with reference to the accompanying drawings.

[0136] refer to Figure 1 , Figure 1 A flowchart of a sonar image classification method combining SLIC superpixels and graph attention networks provided by this invention includes the following steps:

[0137] S1: Based on the imaging principles of two-dimensional forward-looking sonar and side-scan sonar, as well as the prior information during imaging, different preprocessing methods are used to achieve autonomous correction and compensation of sonar images.

[0138] S2: Perform image pre-segmentation on the corrected sonar image based on the improved DeepLabV3+ network to achieve synchronous segmentation of the sonar target highlight area and sound shadow area.

[0139] S3: Use the SLIC superpixel algorithm to construct graph structure data, and combine pixel features and spatial location features to form the final graph attributes.

[0140] S4: Construct a sonar image classification model based on GAT (Graph Attention Network), and feed the constructed sonar graph structure data into the network to complete the training and testing of the model.

[0141] S5: Ablation experiment setup to verify the importance of pixel features and spatial location features, the effectiveness of sonar image pre-segmentation, and the importance of sonar target shadow region information.

[0142] Furthermore, step S1 includes the following steps:

[0143] S11: Forward-looking sonar image reconstruction technology and enhancement algorithm:

[0144] Forward-looking sonar image preprocessing mainly includes image reconstruction and image enhancement. Image reconstruction involves reorganizing the storage structure of the original sonar data based on the forward-looking sonar imaging principle. Image enhancement first uses median filtering to improve the degraded image, removing some meaningless noise while preserving necessary target and shadow information to the maximum extent. Secondly, histogram equalization and pseudo-color addition are used for enhancement, making the pixel grayscale of the target area more distinct. Finally, target and shadow pre-segmentation based on the labeled file is performed, laying the foundation for subsequent feature extraction and segmentation matching.

[0145] Furthermore, step S11 includes the following steps:

[0146] S111: Forward-looking sonar image reconstruction technology:

[0147] The reconstruction of forward-looking sonar images is essentially a coordinate transformation process. Forward-looking sonar images have two representations: one is in polar coordinates, which is the original data format and presented as a pie chart with (r, θ) as the coordinate axes; the other is in the conventional image coordinate system (x, y) obtained after coordinate transformation. The conversion formula between the two coordinate systems is as follows:

[0148]

[0149] Where φ and R represent the horizontal opening angle and slant range of the forward-looking sonar, respectively, and W and H represent the horizontal and vertical dimensions of the image, respectively.

[0150] S112: Forward-looking sonar image enhancement:

[0151] The overall quality of the raw forward-looking sonar images is poor. This is mainly due to two reasons: the complexity of the underwater environment and the lack of gain adjustment for the sonar equipment. These two factors result in large areas of high-frequency noise in the raw images, making them appear as nearly black images. To address this issue of large-area high-frequency noise, the following preprocessing steps are employed for the forward-looking sonar images:

[0152] (1) Coordinate transformation: transform the sonar image in the sector polar coordinate system to the two-dimensional conventional coordinate system.

[0153] (2) Median filtering is performed to suppress noise and better protect the target from gray value abrupt changes in the shadow area.

[0154] (3) Perform histogram equalization, which helps with image display and intuitive solution of sonar images.

[0155] (4) Pseudo-color processing is used to convert grayscale images into color images to improve the recognizability of forward-looking sonar image content.

[0156] S12: Side-scan sonar image grayscale correction algorithm and resolution correction (geometric correction) algorithm:

[0157] Preprocessing of side-scan sonar images mainly includes grayscale correction and geometric correction. Gain compensation aims to compensate for regions with low grayscale mean (i.e., distant regions) and suppress regions with high grayscale mean (i.e., near regions). Resolution correction addresses the issue of large physical size for individual pixels at distant locations and small physical size for individual pixels at near locations, resulting in equidistant pixels after correction. Utilizing prior information such as height and angle from side-scan sonar imaging, autonomously performing gain compensation and resolution correction on the image is more beneficial for subsequent segmentation and matching of targets of interest.

[0158] Furthermore, step S12 includes the following steps:

[0159] S121: Gray-scale correction of side-scan sonar images:

[0160] Before performing grayscale correction, the position of the seabed line in the sonar image needs to be obtained. The position of the seabed line in the image is related to the height of the towed fish; therefore, the position point of the seabed line can be obtained from the pre-acquired height information using the following conversion:

[0161] line orig =N s -(altitude*N s / range) (2)

[0162] In the formula, altitude represents altitude information, range represents the operating range of the sonar, and N s This represents the number of sampling points for a specific sound intensity data point (ping(n)) acquired from a single side. Next, grayscale correction is performed on all pixels within the region width. First, the average grayscale value of each ping section is calculated along the image height direction:

[0163]

[0164] In the formula Nmin The grayscale mean along the width of the sonar image corresponding to the maximum height is then calculated, and finally, a sequence of grayscale correction factors for all pixels is obtained.

[0165]

[0166] S122: Side-scan sonar image resolution correction (geometric correction):

[0167] Based on the geometric relationship between slant range, horizontal distance, and depth, the pixel position correspondence between the sonar image composed of slant range points and the sonar image composed of horizontal distance points is as follows:

[0168] Port side resolution correction factor:

[0169] Starboard resolution correction factor: Where Res represents the image resolution, width represents the image width, PlantRange represents the horizontal distance, SlantRange represents the slant distance, and TowfishAlt represents the towfish height. In this invention, width = 2000, TowfishAlt = a, and Res = 1. Since the converted x1 is a non-integer, pixel correction based on bilinear interpolation is required after geometric transformation.

[0170]

[0171] Furthermore, step S2 includes the following steps:

[0172] S21: Construct a sonar image pre-segmentation model based on an improved DeepLabV3+ network:

[0173] A sonar image pre-segmentation model based on an improved DeepLabV3+ network is constructed. The corrected image is pre-segmented into bright and shadow regions. The backbone feature extraction network is replaced with the lighter MobilenetV2. Feature extraction is enhanced in the Encoder and Decoder stages. The loss function is modified to a combination of Focal loss and Dice loss, as detailed below:

[0174] Since this step only requires pre-segmentation of the target bright area and sound shadow area in the sonar image, the feature extraction capability of the network does not need to be too strong, but the real-time requirements of the algorithm are relatively high. Based on the above considerations, the slow-training Xception series was replaced by MobilenetV2 with fewer parameters as the backbone extraction network in the original DeepLabV3+ network model. The new activation function ReLU6 was used after all convolution calculations.

[0175] y=ReLU6(x)=min(max(x,0),6) (8)

[0176] After completing the backbone network feature extraction, the obtained preliminary effective features are further enhanced. In the Encoder stage, multiple dilated convolutions with different dilation rates are used for parallel feature extraction, resulting in the following output feature y for the preliminary effective feature x at position i:

[0177]

[0178] In the formula, r represents the dilation rate, w represents the convolution kernel, and kernel-size represents the kernel size. After feature extraction using dilated convolution at each dilation rate, the obtained feature results are merged and compressed using a 1x1 convolution to obtain the final feature extraction result. In the Decoder stage, the number of channels is adjusted using a 1x1 convolution, and the adjusted result is stacked with the feature results obtained in the Encoder stage. Finally, two depthwise separable convolutions are used to obtain the final feature extraction result.

[0179] After modifying the network structure, the loss function used during training was improved. Sonar image quality is significantly affected by the underwater environment; sonar images acquired in different waters or with different equipment vary in quality, leading to inconsistent difficulty levels for classification tasks. Therefore, to address the poor model training performance caused by imbalanced sonar image samples, an improvement was made to the cross-entropy loss function, proposing the Focal loss function:

[0180] FL(p t )=-α t (1-p t ) λ log(p t (10)

[0181] In the formula p t This represents the predicted probability. In the multi-class classification task involved in this invention, the predicted probability is the probability output by SoftMax, α. t Let λ represent the weighting factor for each class, and λ represent the adjustment factor. To evaluate the quality of semantic segmentation results, the Diceloss function is introduced.

[0182] S22: Create a sonar image segmentation dataset and complete the training of the pre-segmentation model:

[0183] In the pre-segmentation step, only the bright and shadow regions of the target in the sonar image are pre-segmented, without distinguishing image categories. Therefore, when annotating the sonar images, only two types need to be labeled: bright areas and shadows. Through experiments and online data collection, plus data augmentation techniques, a total of 695 sonar images were obtained. The obtained images were then split into a dataset: 488 images for training, 71 images for validation, and 136 images for testing. After processing all the datasets, they were labeled separately according to bright and shadow areas, thus completing the preparation of all image datasets.

[0184] Next, the model is trained by changing the class parameter in the network to 2, the backbone model to MobilenetV2, the pre-training weight path to deeplab-mobilenetv2.pth, the learning rate to 5e-5, and the image size to 640x640. The labeled dataset is then fed into the network to complete the training of the sonar image pre-segmentation model.

[0185] S23: Use the trained model to perform real-time pre-segmentation of the sonar image to be segmented:

[0186] To address the issue of poor subsequent recognition results due to blurred target edges in actual sonar images, and to facilitate effective extraction of sound shadow information during the recognition process, the sonar image is pre-segmented into target bright areas and sound shadow areas. After training the pre-segmentation model, the weight path in the testing program is modified to the weight file with the best training results. The actual collected sonar images to be segmented are then fed into the program for real-time pre-segmentation of target bright areas and shadows. In the pre-segmentation results, the image contains only three different pixel values, representing: target echo area, sound shadow area, and reverberant background.

[0187] refer to Figure 2 Images of four sonar targets—drowning victims, mines, aircraft, and shipwrecks—are shown after preprocessing and pre-segmentation. Preprocessing correction and pre-segmentation of three regions significantly improve the blurry edges of sonar targets and effectively remove severe noise from the sonar images. The results also demonstrate that pre-segmentation effectively extracts the shadow information of the sonar targets, with very clear shadow edges, which is highly beneficial for subsequent feature extraction of the shadow region.

[0188] Furthermore, step S3 includes the following steps:

[0189] S31: Perform superpixel segmentation based on the SLIC algorithm on the preprocessed and pre-segmented sonar image:

[0190] After the above image preprocessing and pre-segmentation, the problems of blurred sonar target edges are improved, and the accurate extraction of target shadow information is achieved. However, the image data needs to be converted into graph structure data that can be recognized by graph networks. The most intuitive way is to treat each pixel in the image as a node in the graph structure, and the Euclidean distance between each pixel as the edge connecting the nodes. However, the sonar information in this method is very redundant, and the computational load in subsequent convolution is huge. Therefore, before converting the image into graph structure data, this invention uses the SLIC algorithm to convert the pixel data into hundreds of superpixel blocks, thereby greatly reducing the complexity of the task.

[0191] In the SLIC algorithm, each pixel has a 5-dimensional vector V[I,a,b,x,y]. T This indicates that [l, a, b] are involved. T Represents the pixel color feature coordinates in the CIE-LAB color space, [x, y]. T This represents the pixel space feature coordinates. The specific steps for applying the SLIC algorithm to preprocessed and pre-segmented sonar images are as follows:

[0192] (1) Initialize cluster centers:

[0193] The pre-segmented sonar image is divided into multiple superpixel blocks of uniform area. The number of superpixels to be generated is pre-set to M, and the initial cluster center of each superpixel is defined as C. i =[l i ,a i ,b i ,x i ,y i ] T (i = 1, ..., M), let them be uniformly distributed within the image. Assume the total number of pixels in the original image is N, and all superpixels have the same size. Then the number of pixels (superpixel area) contained in each superpixel is N / M, and the distance between adjacent superpixel cluster centers is approximately...

[0194] In this invention, the sonar image size is 200×200, N=40000, and M=200.

[0195] (2) Reselect the cluster center location:

[0196] The initial cluster center is defined as C i =[l i ,a i ,b i ,x i ,y i ] T(i = 1, ..., M), but due to the poor effect of the initial definition, the center point may fall on the contour boundary with a large gradient, which will affect the subsequent clustering effect. Therefore, it is necessary to reselect the optimal clustering center within the local neighborhood of the initial point in the range of n×n (generally n is 3).

[0197] The specific optimization method is as follows: within a 3×3 area centered on the initial cluster center, calculate the gradient value of all pixels within this area, and select the pixel with the smallest gradient. Then move the cluster center to that point.

[0198] (3) Initialize the pixels and assign an initial class label to each pixel:

[0199] After obtaining the cluster center locations of the entire sonar image, labels need to be assigned to each pixel within the neighborhood of each superpixel cluster center. In the SLIC algorithm, the superpixel area is approximately S×S, and its search range is limited to the vicinity of the cluster center; therefore, the search range is set to 2S×2S. Because SLIC restricts the size of the search area, it significantly reduces the number of distance calculations compared to the traditional k-means clustering algorithm.

[0200] The specific label assignment method is as follows: define a label array to store the index value of the superpixel to which each pixel belongs, and a distance array to store the distance from each pixel to the center of its superpixel. Since the SLIC algorithm initially divides the image according to a uniform distribution, this step updates the label array and distance array according to the initial superpixel division region.

[0201] (4) Distance similarity measurement:

[0202] After assigning labels to all pixels, the cluster center vector needs to be recalculated from the original C. i =[l i ,a i ,b i ,x i ,y i ] T (i = 1, ..., M) are updated to the average metric [l, a, b, x, y] of all pixels contained in each superpixel. T :

[0203]

[0204] In the formula N i Let represent the number of pixels contained in the i-th superpixel.

[0205] Then, after normalizing both the pixel feature distance and the spatial location feature distance, they are combined into a single metric, and the maximum feature distance N within each cluster is taken. s and N c The distance metric D between pixels and cluster centers is expressed as follows:

[0206]

[0207] In the formula N c A constant τ is fixed to represent the relative importance between color similarity and spatial proximity, thus the above formula can be written as:

[0208]

[0209] When τ is large, spatial proximity plays a dominant role, resulting in more compact superpixels in the clustering results; when τ is small, color similarity plays a dominant role, and the superpixels formed by clustering can better preserve image edge information. In this invention, τ = 10 is used for color sonar images, and τ = 0.25 is used for grayscale sonar images.

[0210] (5) Iterative optimization of clustering results:

[0211] After defining the distance similarity measurement rules between pixels and cluster centers, a local search is performed within a 2S×2S neighborhood of each superpixel center. The distance from each pixel within this neighborhood to the superpixel center is calculated according to the measurement rules. If the distance is less than the distance from the pixel to its original cluster center, the pixel is considered a current superpixel, and vice versa. After calculating and comparing all pixels, the distance and label arrays are updated, and the coordinates of the region center points are recalculated to update the superpixel cluster center positions.

[0212] Meanwhile, the error between the currently calculated distance and the result of the previous iteration is calculated to obtain the iteration error residual. The error residual is then iteratively optimized until convergence. After multiple experiments, it was verified that most clusters can achieve the ideal segmentation effect after 10 iterations. Therefore, considering both computational efficiency and superpixel segmentation effect, the number of iterations is fixed at 10.

[0213] refer to Figure 3This paper displays the superpixel clustering results generated using the SLIC algorithm for four types of sonar targets: drowning victims, mines, aircraft, and shipwrecks. The first row shows the original sonar image; the second row shows the superpixel results generated from the original sonar image; the third row shows the superpixel results generated from the highlighted areas of the sonar target obtained through pre-segmentation using an improved DeepLabV3+ network; and the fourth row shows the superpixel results generated jointly from the highlighted and shadow areas of the sonar target obtained through pre-segmentation. The SLIC clustering results show that the original sonar image has relatively blurred target edges, resulting in messy superpixels that do not accurately represent the sonar target information. In contrast, the pre-segmented sonar image has clear edges, with a significant distinction between the target boundary and the reverberant background. In the segmentation results, the superpixels in the background area are neatly arranged, while the superpixels in the target area effectively preserve the target edge information. Furthermore, in the clustering results including the shadow area, the boundary information of the target shadow is well preserved, and the pixel values ​​stored within the superpixels effectively distinguish between the highlighted and shadow areas, providing richer sonar target information for subsequent identification.

[0214] S32: SLIC Superpixel Clustering Result Storage and File Generation:

[0215] After completing SLIC superpixel clustering, the results need to be stored in a standard format for subsequent graph structure data generation. The stored content includes four main parts: the label of each image, the sequence number of all superpixels in each image, the pixel mean, and the center coordinates. The image labels are divided into four types: drowning person, mine, airplane, and shipwreck. Since all content needs to be written to a binary file, each label type is defined as the numbers 0, 1, 2, and 3, respectively. The pixel mean of each superpixel is defined as:

[0216]

[0217] The center coordinates of each superpixel are defined as follows:

[0218]

[0219] The sp-order of all superpixels is sorted according to the pixel mean calculation order mentioned above. This sp-order is used to construct the standard DGL graph structure data in the subsequent recognition process. After collecting all datasets, they are divided into training, validation, and test sets. Then, the SLIC clustering results generated for each image in each dataset are written into their respective files according to the order of the four features, finally generating three data files: train.pkl, val.pkl, and test.pkl.

[0220] S33: Construct graph-structured data based on sonar images using the superpixel segmentation results:

[0221] After obtaining the three files generated from the SLIC clustering results, these files are used to construct a graph-structured data. Furthermore, step 33 includes the following steps:

[0222] S331: Graph structure data representation under sonar images:

[0223] A graph is a non-Euclidean data structure, denoted as G = {V, E}, where V = {v1, ..., v2}. M} represents the set of nodes; E = {e1, ..., e2} P} represents the set of edges. In this invention, a graph is defined as the target category label information represented by a sonar image, including four types: drowning person, mine, aircraft, and shipwreck; each node in the graph is defined as the center of each superpixel obtained by SLIC clustering, and the node information is the attribute information contained in the superpixel; the edge between nodes is defined as the connection relationship between two superpixels, and the edge information is the distance information between the two superpixels.

[0224] After defining nodes and edges, it's not enough to fully construct the graph structure data using this information alone. An adjacency matrix is ​​needed to define the relationships between all nodes. An adjacency matrix is ​​a two-dimensional array that reflects the degree of association between any two nodes. In an unweighted, undirected graph, the adjacency matrix is ​​defined as A∈{0,1}. M×M In this invention, a weighted directed graph with another representation is adopted, and the corresponding adjacency matrix transformation is W∈R. N×N W i,j =w i,j Indicates from node v i to v j The weight of the edge is defined flexibly; in this paper, it is defined as the coefficient calculated jointly by the pixel feature distance and spatial location feature distance between superpixels, W. i,j =0 means v i and v j There is no edge between the two nodes.

[0225] S332: Definition of Graph Structure Attributes in Sonar Images:

[0226] Compared to deep learning methods in Euclidean space, graph neural networks based on non-Euclidean space introduce spatial location features when constructing graph-structured data. This allows for highly effective extraction and utilization of shadow region information in sonar images, which contains crucial information such as the height and shape of the sonar target. Traditional sonar classification and recognition algorithms start with pixel features, performing multi-layer feature extraction on pixels within the bright areas of the sonar target. This ignores the sonar target's shadow region information and the positional information of bright and shadow areas in the sonar image, and fails to establish a correlation between bright and shadow areas. To address these issues, this invention utilizes graph-structured data to simultaneously consider both pixel and positional features of the bright and shadow regions of the sonar target, achieving higher image recognition performance.

[0227] Furthermore, step 332 includes the following steps:

[0228] S3321: Definition of node attributes in a Graph structure:

[0229] In addition to their own numbering, nodes in the diagram can also contain many other types of attributes. In this invention, nodes are defined as superpixels, and node attributes are defined as two types of features: positional features and pixel features. Positional information refers to the center coordinates of each superpixel, and pixel information refers to the average pixel value of each superpixel. Because the sonar image is pre-segmented, bright areas are identified by one color and shadows by another. Therefore, when calculating the average pixel value, the average pixel value of superpixel blocks containing the background area is close to 0, while the average pixel value of superpixel blocks containing shadows and bright areas is relatively large, but the difference between the two values ​​is also large. This allows for the differentiation of the three types of sonar echoes in terms of pixel features. The nodes and their attributes are specifically represented as follows:

[0230]

[0231] S3322: Calculation of the adjacency matrix in a graph structure:

[0232] The adjacency matrix stores all pairwise connections between nodes. In this invention, the connection relationships between nodes consider both positional and pixel features. Since the position coordinates range from [0, 200) and the pixel size ranges from [0, 1], the position coordinates need to be normalized first. The specific calculation is as follows:

[0233]

[0234] When calculating pixel differences, since superpixel values ​​are stored using RGB three-channel storage, the pixel difference between superpixels containing bright areas and those containing shadows is the largest, followed by the pixel difference between bright areas and the background, and then the pixel difference between shadows and the background. Based on these phenomena, the boundaries between bright and shadow areas, bright and background areas, and shadows and background areas of sonar targets can be effectively distinguished among spatially close superpixels. This achieves the goal of effectively integrating sonar target shadow information into the graph structure data.

[0235] S3323: Definition of edge attributes in a Graph structure:

[0236] In addition to edge indices representing connectivity, edges in the graph can also contain many other types of attributes. In this invention, the attribute of an edge is defined as the superpixel distance calculated by combining pixel features and spatial location features. According to the definition of the adjacency matrix above, since the connectivity of distant nodes is very weak, the edge information stored in the matrix is ​​almost ineffective. To eliminate redundancy, edge indices and attributes of two very distant nodes are not stored; the judgment rule is determined by the K-Nearest Neighbor (KNN) algorithm. The relative boundary between far and near is determined by the number of superpixels M. In this invention, M = 200, and the relative boundary values ​​for far and near are κ = 8, 15, 30, 50, 100. Therefore, the total number of stored edges is κ × 200 = 200κ. Since the final recognition results vary under different values, further analysis and verification will be conducted in subsequent ablation experiments. The specific edge weight calculation formula is as follows:

[0237]

[0238] In the formula, (x i ,y i ) is superpixel v i Position coordinates, f(x) i ,y i ) is superpixel v i The average pixel value, δ x It is a scale parameter, representing the value of each node v. i The average distance to the nearest κ nodes, δ f It is also a scale parameter, representing the value of each node v. i The average pixel difference between the edge and its κ nearest neighbors, γ is a measure of relative importance, representing the relative weight between pixel features and spatial location features. The edges and their attributes are specifically represented as follows:

[0239]

[0240] The attribute definitions for the entire graph structure data are summarized as follows:

[0241]

[0242] S333: Convert to DGL data:

[0243] The graph attention network developed in this invention is based on the DGL framework. Therefore, after constructing the graph structure data, it needs to be converted into standard DGLGraph data under the DGL framework.

[0244] In the DGL framework, a node is represented by an integer, called the node ID; a pair of integers e is used. i (u,v) represents an edge, where u and v correspond to the node IDs of the start and end points of the edge, respectively. i This represents the edge ID. Both nodes and edges can contain features with custom names, accessed via the `ndata` and `edata` attributes, respectively. Therefore, in this invention, the superpixel number `sp-order` after SLIC clustering of each sonar image is used as the node ID, and the number pairs between connected superpixels are used as edge ID pairs. The pixel feature `f(x)` of the node... i ,y i ) and location features (x i ,y i The edge weight feature W is written into the ndata attribute. i,j Write it into the edata attribute to complete the full construction of the DGLGraph data.

[0245] refer to Figure 4 The image displays DGLGraph data generated from one of the sonar images. Each vertex is assigned an ID, the total number of which equals the number of superpixels obtained from SLIC clustering. Vertices with a connection are linked by lines, and the edge ID pairs contained in the connecting edges are shown. It is evident from the image that the graph structure is completely inconsistent with the original image structure. It is difficult to intuitively observe the type of sonar target represented by the graph from the graph structure data. Therefore, in subsequent identification, it is necessary to refer to the attributes of nodes and edges, as well as the adjacency matrix.

[0246] Furthermore, step S4 includes the following steps:

[0247] S41: Construction of a graph attention network based on sonar data:

[0248] After converting the results of SLIC superpixel clustering into graph-structured data, it is fed into a graph attention network for model training and testing. The graph attention network model employs an attention module to embed nodes in the graph. By calculating the attention coefficients between the current node and its neighbors, it aggregates neighbor information, achieving adaptive allocation of weights for different neighbors, thereby learning neighborhood and spatial features. The implementation of the graph attention network involves stacking multiple attention layers; therefore, the construction of the graph attention layers is the most crucial aspect of this network.

[0249] The input to the graph attention layer is a combination of features from each node, represented as: Where M is the number of nodes, h i Let F be the feature vector of the i-th node, and F be the number of features for each node. In this invention, nodes are superpixels obtained from sonar images, and node features are pixel features obtained from pixel aggregation and spatial location features. After the input passes through an attention layer, the output yields a new combination of node features. The shared linear transformation between each node of the input and output is defined as a parameterized weight matrix. Because the graph attention layer incorporates an attention mechanism, assigning different coefficient weights to the current node and its neighboring nodes, the input and output of the entire graph attention layer can be represented as follows:

[0250]

[0251] In the formula α ij Let α represent the attention coefficient (weight) between node i and node j, Q be the weight matrix obtained through backpropagation, and σ be the non-linear activation function. The determination of the attention coefficient α is the core of the graph attention network, and the specific calculation process is as follows:

[0252] First, implement a self-attention mechanism at each node: Thus, the attention coefficient e, which represents the importance of the feature of node j to node i, is calculated. ij :

[0253]

[0254] Then, a LeakyReLU function with a negative slope of 0.2 is used for non-linear processing, and SoftMax is introduced to regularize all attention coefficients. Simultaneously, by performing hidden attention, attention coefficients are calculated only for other nodes within a certain neighborhood of a node. Therefore, the coefficients of the complete attention mechanism are expressed as follows:

[0255]

[0256] In the formula, T represents transpose, and || represents concatenation operation.

[0257] In the final attention coefficient expression, two unknowns remain: the specific location of the selected neighborhood and the number of nodes k contained in the neighborhood. In this invention, the value of the number of nodes k in the neighborhood is consistent with the value of κ in the KNN algorithm; the specific k neighboring nodes are determined by the edge connectivity attributes obtained by the KNN algorithm. The specific process is as follows:

[0258] Choose an appropriate value for k, and execute the KNN nearest neighbor algorithm to update the adjacency matrix and edge weights:

[0259]

[0260] The specific calculation formula for the i-th row and j-th column of the matrix is:

[0261]

[0262] In the formula Let represent the edge weight from node i to node j, and KNN stands for K nearest neighbor algorithm.

[0263] After obtaining the adjacency matrix and edge connections, weak edges on each node need to be removed, leaving the k strongest edges. For each node, the connectivity (edge ​​weights) between that node and the remaining nodes are sorted from strongest to weakest, and the top k strongest edges are retained. For simplicity and to enhance differentiation, the weight W of the top k strongest edges is set... i,j We assign the weight of each edge to 1, and the weight of the remaining edges to negative infinity, thus obtaining the updated adjacency matrix W and edge weights Wi. i,j Before performing SoftMax normalization, multiplying matrix W with the attention coefficient matrix determines the value of k and the specific edge that k represents, thus obtaining the final expression for the attention mechanism coefficients:

[0264]

[0265] After calculation, the attention coefficients between different nodes are obtained. These coefficients are then substituted into the calculation formula between the input and output of the convolutional layer to obtain the final output features of each node. Furthermore, to make the self-attention learning process more stable, a multi-head attention mechanism is employed. Specifically, Ω independent attention mechanisms are used to perform the above input-output transformation, and then the features obtained from each transformation are concatenated to obtain the final output features as shown below:

[0266]

[0267] In the formula Q represents the normalized attention coefficient under the ω-th attention mechanism. ωHere, || represents the corresponding weight matrix, and || represents the concatenation operation. After constructing the graph attention layer, a suitable number of attention layers is selected to complete the network model. The entire network dynamically generates attention coefficients between nodes and their neighbors using the graph attention module. Then, based on the correlation between nodes, the attention coefficients are multiplied by the updated edge weight coefficients, giving the model greater flexibility for specific input samples. Finally, through iterative training, the final weight coefficients are obtained, thereby achieving graph structure classification based on sonar images.

[0268] S42: Dataset collection, preparation, and construction:

[0269] After converting the constructed graph structure data into standard DGLGraph data under the DGL framework, the dataset needs to be divided into training, validation, and test sets. During the DGLGraph structure conversion process, the data is directly stored in the divided sets. There are four types of graph structure data: drowning victims, mines, aircraft, and shipwrecks. The training set contains 488 graph structures, including 98 drowning victims, 119 mine targets, 87 aircraft, and 184 shipwrecks; the validation set contains 71 graph structures, including 17 drowning victims, 18 mine targets, 13 aircraft, and 23 shipwrecks; the test set contains 136 graph structures, including 13 drowning victims, 35 mine targets, 23 aircraft, and 64 shipwrecks. Details are as follows:

[0270]

[0271] S43: Experimental setup and model training:

[0272] Experimental environment configuration: All network models in this invention use Python as the programming language, PyTorch as the deep learning framework, and DGL as the graph neural network framework. The system is configured with a CPU model of [CPU model number missing]. The model was trained and tested on an Ubuntu system with a silver 4110 CPU (2.10GHz), 64GB of RAM, an NVIDIA GeForce RTX 3080 GPU, and CUDA 11.4 GPU acceleration library.

[0273] Network parameter settings: The number of graph attention layers L is set to 4, the number of hidden units (hidden-dim) is 19, the number of output feature vector units (out-dim) is 152, residual is set to true (i.e., using the residual connections within the layer), readout is set to mean (i.e., obtaining the feature representation of the entire graph by aggregating node features on average), the number of independent attention mechanisms (n-heads) in multi-head attention is set to 8, random dropout of input features (in-feat dropout) is 0, the overall dropout is also 0, batch-norm layer is set to true, and self-loop is set to true.

[0274] Model training parameter settings: Modify the network model to GAT, set the dataset to the prepared sonar dataset, set the number of classes to 4, set the random seed to 41, set the epochs to 250, set the batch size to 4, set the initial learning rate to 0.001, set the learning rate decay coefficient to 0.5, set the tolerance for no performance improvement to 10, set the lower bound of the learning rate to 1e-8, set the weight decay coefficient to 0, set the epoch time interval to 5, and set the maximum execution time to 12.

[0275] Model Training: After modifying and setting all parameters, the graph structure data created based on the SLIC superpixel clustering algorithm is fed into the network for model training. The specific training steps for the GAT model used for sonar image classification are as follows:

[0276] (1) Load graph structure data. Load the DGLGraph data file containing the training set, validation set and test set, read the information of nodes, edges, graph and labels and the adjacency matrix representing the degree of node association, and update the edge weights and adjacency matrix according to the k-nearest neighbor algorithm.

[0277] (2) Define the network model and the forward propagation process. The network model is defined as a Graph Attention Network (GAT), and the forward propagation function is defined in the network model. The parameters in the model are initialized using a Glorot uniform distribution. During the forward propagation process, GAT convolutions (graph attention layers) are used to extract features and pass messages in the graph structure, and finally the residuals are iteratively optimized.

[0278] (3) Define the loss function and optimizer. Since this invention is based on the GAT model for multi-class classification, the nn.CrossEntropyLoss cross-entropy loss function, which is encapsulated in PyTorch, is used as the loss function of the model. The optimizer is Adam, which has very efficient computation. The learning rate required for model training is defined in the optimizer.

[0279] (4) Calculate the error. Calculate the error between the predicted value and the true label value based on the defined loss function.

[0280] (5) Parameter Update. The model parameters in the forward propagation are updated using the optimizer.step() function to reduce the error between the predicted values ​​and the true labels. When the error is less than a certain threshold, the model parameters corresponding to the minimum loss value are recorded.

[0281] (6) Category prediction. The model parameters corresponding to the minimum loss value recorded at the end of the iteration are replaced in the forward propagation to identify the category of the sonar image to be classified.

[0282] refer to Figure 5 The chart shows the loss function and recognition rate curves for the training and validation sets during training, as well as the curves showing how these values ​​change with the number of iterations. The first chart in the first row shows the loss function of the training set as a function of epochs. The curve is relatively stable, with the loss value eventually dropping to around 0.1. The function stabilizes around 100 iterations, indicating good training performance. The second chart in the first row shows the recognition accuracy of the training set as a function of iterations. The curve shows relatively small fluctuations and also stabilizes around 100 iterations. The final recognition rate reaches approximately 0.96, which, while still close to 1, indicates good training performance for the network model under this mode. The second row shows the loss function and recognition accuracy of the validation set as a function of iterations. Both curves show significant fluctuations. Although a convergence trend is clearly visible, the final loss value and recognition rate are not very good.

[0283] S44: Model Testing and Result Analysis

[0284] After the GAT model based on sonar images is trained, the weight parameters, which have stabilized after training, are selected to classify sonar image categories on the test set. Finally, the recognition accuracy on the test set is used to evaluate the model. Specifically, in this step, the network model needs to be switched to test mode, Batch Normalization and Dropout are turned off, and the model is tested using evaluate-network.

[0285] refer to Figure 6The curve showing the change in recognition accuracy of the test set with the number of iterations and a schematic diagram of the best recognition result are displayed. Observing the recognition curve of the test set, the fluctuation range is relatively large during the iterative convergence process, but after the number of iterations reaches 70, the entire curve tends to be stable, and the recognition accuracy eventually stabilizes at around 0.9. Next, the recognition effect of each iteration was statistically analyzed, and a schematic diagram of the best recognition result was obtained. From the results, it can be seen that the optimal recognition accuracy is 91.2%, and the average test effect is 89.5%, that is, with only a few hundred sonar images in the dataset, the test recognition rate reaches around 90%.

[0286] Furthermore, step S5 includes the following steps:

[0287] S51: Verify the effectiveness of image pre-segmentation and sonar target shadow region information:

[0288] To verify the effectiveness of image pre-segmentation and sonar target shadow region information, three types of datasets were created. The first type consisted of raw sonar images without image pre-segmentation or extraction of target shadow region information. The raw images were directly converted into graph-structured data using the SLIC superpixel segmentation algorithm. This constructed graph-structured dataset was then fed into the GAT network for model training. The curve of the training loss function changing with the number of iterations was obtained. After training, sonar images were classified using data from the test set, and the recognition rate on the test set was calculated to obtain the optimal recognition performance.

[0289] The second category involves pre-segmenting the raw sonar images without extracting acoustic shadowing information. Specifically, the raw sonar images are first pre-processed and pre-segmented, and then the SLIC clustering algorithm is applied to the pre-segmented images to obtain complete graph structure data, thereby completing the construction of the dataset required for the GAT model. Similarly, the model is subsequently trained and tested.

[0290] The third type involves raw sonar images that require both image pre-segmentation and extraction of sonar shadow region information. After pre-segmentation, bright and shadow areas of the target are labeled with different colors. Then, the SLIC clustering algorithm is used to store target edge information and the correlation between bright and shadow areas based on pixel features and spatial location features, forming a graph-structured dataset with richer attributes. Finally, this dataset is also fed into the network for model training and testing. The training curves and test results obtained from the above three types of datasets are compared pairwise to verify the effectiveness of image pre-segmentation and sonar target shadow region information extraction.

[0291] refer to Figure 7The diagram illustrates the comparative results of validating the effectiveness of image pre-segmentation and sonar target shadow region information. Firstly, from the perspective of model training, the convergence trends of the loss functions for the three datasets are roughly similar. The loss curves in all three cases are relatively stable and tend to stabilize around 100 iterations. However, the loss values ​​reached after stabilization differ: the loss value based on the original sonar image stabilizes around 0.7, the loss value based on image pre-segmentation stabilizes around 0.4, and the loss value based on the combined image pre-segmentation and sonar target shadow region information stabilizes around 0.1. Therefore, from the perspective of training effect, the correctness and effectiveness of the proposed algorithm's combined image pre-segmentation and sonar target shadow region information are verified. Finally, from the perspective of model recognition accuracy, the average recognition rate without any processing is 67.8%, with the best recognition result being 70.6%; the average recognition rate with only image pre-segmentation is 73.4%, with the best recognition accuracy being 77.9%; and the average recognition accuracy with simultaneous image pre-segmentation and sonar target shadow region information extraction and utilization is 89.5%, with the best recognition result being 91.2%. Compared with the best recognition results, the model recognition accuracy was improved by 20.6% when image pre-segmentation and sonar target shadow region information were used simultaneously, which strongly verified the effectiveness of combining image pre-segmentation and sonar target shadow region information.

[0292] S52: Test the impact of the relative weight γ between pixel features and spatial location features on recognition performance:

[0293] Since the weight factor γ takes the value [0,1], a larger weight indicates a greater proportion of spatial location features, while a smaller weight indicates a greater proportion of pixel features. In order to explore the most suitable weight allocation between spatial location features and pixel features, that is, the optimal relative weight γ value, this invention sets γ to different values ​​[0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1] to test the influence of the relative weight γ between pixel features and spatial location features on the recognition effect.

[0294] refer to Figure 8This diagram illustrates the comparison of recognition performance under different values ​​of the relative weight γ between pixel features and spatial location features. The magnitude of γ determines the influence of pixel features and spatial location features on the sonar classification performance. First, observing the loss function curve during model training, the convergence trend of all different values ​​is roughly the same, stabilizing around the 100th iteration. The loss value reaches its minimum after stabilization when γ is 0.6 and 0.7, while the training performance is worst when γ is 0. This initially demonstrates that relying solely on pixel features is insufficient to achieve good recognition results, and the model's performance is also poor. Finally, observing the recognition results during model testing, the recognition accuracy gradually increases with increasing γ, reaching its optimal value at γ of 0.7. Further increases in γ lead to a deterioration in recognition performance. Therefore, the model's performance first increases and then decreases with increasing γ, reaching its optimal performance at 0.7. This indicates that in sonar image classification tasks, spatial location features dominate the entire model training, while pixel features play an auxiliary role in recognition.

[0295] S53: Test the impact of different attribute calculation methods on recognition performance:

[0296] Different attribute calculation methods result in different ranges of calculated attribute values. Although normalization can be performed subsequently, the calculated edge weights for the same pixel value and spatial location are inconsistent, and even after normalization, the distribution of values ​​remains different. For example, using the Sigmoid function, the values ​​vary greatly within ±5, while the calculated weight values ​​outside this range show almost no change. Alternatively, using the negative exponent of e, the values ​​vary significantly within the [0,1] range, while changes outside this range are slow. Therefore, to find the most suitable attribute calculation method, this project selected four functions for calculating attributes: the Sigmoid function, the symmetrical Sigmoid function, the negative exponent of e, and direct addition. The recognition rate under each function was calculated based on the function expression, and the recognition results calculated under various methods were compared to select the optimal attribute calculation method. The four attribute calculation methods are shown in the following formulas:

[0297]

[0298] refer to Figure 9 The diagram illustrates a comparison of recognition performance under different attribute calculation methods. Firstly, from the perspective of model training loss, the training results under all calculation methods are roughly similar, with the best performance achieved by the Sigmoid function. Secondly, from the perspective of model recognition accuracy, the attribute calculation method using the Sigmoid function still yields the best recognition performance. In terms of optimal recognition performance, the Sigmoid function form improves upon the worst-performing negative exponential form by 5.9%.

[0299] S54: The impact of different values ​​of the number of nodes k contained in the neighborhood of a test node on the recognition performance:

[0300] Different values ​​of k directly affect the calculation of the attention mechanism coefficients in graph attention networks. If the value of k is too small, it will lead to a huge loss of information and too little aggregated neighborhood information, making it difficult to leverage the performance of graph attention networks based on neighborhood information. If the value of k exceeds a certain threshold, it may introduce too much noise, leading to a decrease in model performance. Therefore, it is necessary to find an optimal value of k to achieve the best model performance. In this invention, k is set to [5, 10, 20, 30, 50, 100], and the training effect and recognition accuracy of the model under different k values ​​are compared, thereby completing the test of the impact of different values ​​of the number of nodes k contained in the neighborhood of a node on the recognition effect.

[0301] refer to Figure 10 The diagram illustrates the comparison of recognition performance under different values ​​of the number of nodes k within a node's neighborhood. It can be seen that the parameter k has a significant impact on the model's recognition performance. As the value of k increases, the model's performance gradually decreases. When k is 8, the model achieves optimal performance, which is consistent with our expectations. When the number of neighborhoods is small, as the value of k increases, the model can extract more useful information from the neighborhood, thus achieving better recognition results. However, once k exceeds a certain threshold, the useful information in the neighborhood becomes saturated, and introducing too much neighborhood information will introduce unnecessary noise, leading to a decrease in the model's recognition performance.

[0302] The preferred embodiments and principles of the present invention have been described in detail above. For those skilled in the art, there may be changes in the specific implementation based on the ideas provided by the present invention, and these changes should also be considered within the scope of protection of the present invention.

Claims

1. A sonar image classification method combining SLIC superpixels and graph attention networks, characterized in that, Includes the following steps: S1: Based on the imaging principles of two-dimensional forward-looking sonar and side-scan sonar, as well as the prior information during imaging, different preprocessing methods are used to achieve autonomous correction and compensation of sonar images. S2: Perform image pre-segmentation on the corrected sonar image based on the improved DeepLabV3+ network to achieve synchronous segmentation of the sonar target's highlight area and sound shadow area; S3: Use the SLIC superpixel algorithm to construct graph structure data, and combine pixel features and spatial location features to form the final graph attributes; S4: Construct a sonar image classification model based on GAT graph attention network, and feed the constructed sonar graph structure data into the network to complete the training and testing of the model; S5: Ablation experiment setup to verify the importance of pixel features and spatial location features, the effectiveness of sonar image pre-segmentation, and the importance of sonar target shadow region information; Step S1 includes the following steps: S11: Forward-looking sonar image reconstruction technology and enhancement algorithm: Forward-looking sonar images have two representations: one is in polar coordinates, which is the original data format acquired in the original dataset. The other is presented as a pie chart, serving as the coordinate axes; the other is a conventional image coordinate system obtained through coordinate transformation. Below; the transformation formula between the two coordinate systems is as follows: (1) in , These represent the horizontal opening angle and slant range of the forward-looking sonar, respectively. , The horizontal and vertical dimensions of the image are represented respectively; the following preprocessing steps are then performed: Coordinate transformation converts the sonar image from the sector polar coordinate system to a two-dimensional conventional coordinate system; Median filtering suppresses noise and better protects the target from abrupt changes in grayscale values ​​when it enters the shadow area. Histogram equalization is performed to improve image display and provide a more intuitive understanding of sonar images. Pseudo-color processing is used to convert grayscale images into color images, thereby improving the recognizability of forward-looking sonar image content; S12: Side-scan sonar image grayscale correction algorithm and resolution correction algorithm: Grayscale correction is performed on all pixels within the region width. First, the height direction of the image is statistically analyzed. Mean grayscale value of the cross section: (3) In the formula This represents the width of the sonar image region corresponding to the maximum height. For the first The data is analyzed by determining the location of the seabed line, then calculating the average grayscale value along the width direction, and finally obtaining the grayscale correction factor sequence for all pixels. (4) Based on the geometric relationship between slant range, horizontal distance, and depth, the pixel position correspondence between the sonar image composed of slant range points and the sonar image composed of horizontal distance points is as follows: Port side resolution correction factor: (5) Starboard resolution correction factor: (6) in, Indicates the resolution of the image. Indicates the image width. Indicates horizontal distance. Indicates the slant distance. This indicates the height of the towed fish.

2. The sonar image classification method combining SLIC superpixels and graph attention network according to claim 1, characterized in that, Step S2 includes the following steps: S21: Construct a sonar image pre-segmentation model based on an improved DeepLabV3+ network: A sonar image pre-segmentation model based on an improved DeepLabV3+ network is constructed. The corrected image is pre-segmented into bright areas and sound shadow areas. The backbone feature extraction network is replaced with the more lightweight MobilenetV2. Feature extraction is enhanced in the Encoder and Decoder stages. The loss function is modified to be a combination of Focal loss and Dice loss functions. S22: Create a sonar image segmentation dataset and complete the training of the pre-segmentation model: In the pre-segmentation step, only the target highlight areas and sound shadow areas in the sonar images are pre-segmented, without distinguishing image categories. Therefore, when annotating the sonar images, only two types, bright areas and shadows, need to be labeled. Through experiments and online data collection, plus data augmentation technology, a total of 695 sonar images were obtained. The obtained images were then divided into a dataset, with 488 images for training, 71 images for validation, and 136 images for testing. After organizing all the datasets, they were labeled according to bright areas and shadows, thus completing the preparation of all image datasets. The labeled datasets were then fed into the network to finally complete the training of the sonar image pre-segmentation model. S23: Use the trained model to perform real-time pre-segmentation of the sonar image to be segmented: To address the issue of poor subsequent recognition results due to blurred target edges in actual sonar images, and to facilitate the effective extraction of sound shadow information during the recognition process, the sonar images are pre-segmented into target bright areas and sound shadow areas. After training the pre-segmentation model, the weight path in the test program is modified to the weight file with the best training results. The actual collected sonar images to be segmented are then fed into the program for real-time pre-segmentation of target bright areas and shadows. In the pre-segmentation results, the image contains only pixels with three different pixel values, representing the target echo area, sound shadow area, and reverberant background, respectively.

3. The sonar image classification method combining SLIC superpixels and graph attention network according to claim 1, characterized in that, Step S3 includes the following steps: S31: Perform superpixel segmentation on the preprocessed and pre-segmented sonar images based on the SLIC algorithm: The image data needs to be converted into graph structure data that can be recognized by the graph network. Before converting the image into graph structure data, the SLIC algorithm is used to convert the pixel data into hundreds of superpixels. S32: SLIC Superpixel Clustering Result Storage and File Generation: After completing SLIC superpixel clustering, the results need to be stored in a standard format for subsequent Graph structure data generation. The stored content includes four parts: the label of each image, the sequence number of all superpixels in each image, the pixel mean, and the center coordinates. The image labels are divided into four types: drowning person, mine, airplane, and shipwreck. After collecting all datasets, they are divided into training set, validation set, and test set. Then, the SLIC clustering results generated for each image in each dataset are written into their respective files according to the four features. S33: Construct graph-based data structured from sonar images based on superpixel segmentation results: The graph is defined as the target category label information represented by sonar images, including four types: drowning victims, mines, aircraft, and shipwrecks. Each node in the graph is defined as the center of each superpixel obtained by SLIC clustering, and the node information is the attribute information contained in the superpixel. The edges between nodes are defined as the connection relationship between two superpixels, and the edge information is the distance information between two superpixels. The attribute definitions of the entire graph structure data are summarized as follows: (20)。 4. The sonar image classification method combining SLIC superpixels and graph attention network according to claim 1, characterized in that, Step S4 includes the following steps: S41: Construction of a graph attention network based on sonar data: After converting the results of SLIC superpixel clustering into graph structure data, it is fed into a graph attention network for model training and testing. An attention module is used to embed nodes in the graph, and neighbor information is aggregated by calculating the attention coefficients between the current node and its neighbors, achieving adaptive allocation of different neighbor weights to learn neighborhood and spatial features. Multiple attention layers are then stacked to complete the GAT (Graph Attention Network). After the graph attention layers are built, an appropriate number of attention layers is selected to complete the network model. The entire network dynamically generates the attention coefficients between nodes and their neighbors using the graph attention module, and then multiplies the attention coefficients with the updated edge weight coefficients based on the correlation between nodes, giving the model greater flexibility for predetermined input samples. Finally, through iterative training, the final weight coefficients are obtained, thus achieving graph structure classification based on sonar images. S42: Dataset collection, preparation, and construction: After converting the constructed graph structure data into standard DGLGraph data under the DGL framework, the dataset needs to be divided into training, validation, and test sets. During the DGLGraph structure conversion process, the data is directly stored in the divided sets. There are four types of graph structure data: drowning victims, mines, aircraft, and shipwrecks. The training set contains 488 graph structures, of which 98 are drowning victims, 119 are mine targets, 87 are aircraft, and 184 are shipwrecks. The validation set contains 71 graph structures, of which 17 are drowning victims, 18 are mine targets, 13 are aircraft, and 23 are shipwrecks. The test set contains 136 graph structures, of which 13 are drowning victims, 35 are mine targets, 23 are aircraft, and 64 are shipwrecks. S43: Experimental Setup and Model Training: Parameter settings: Set the number of graph attention layers, number of hidden units, number of output feature vector units, residual state, readout state, number of independent attention mechanisms in multi-head attention, number of random dropouts of input features, total dropout value, state of batch-norm layer, state of self-loop; and set the network model, dataset, number of classes, random seed, number of epochs, batch size, initial learning rate, learning rate decay coefficient, number of times the learning rate can not be improved, lower bound of learning rate, weight decay coefficient, epoch time interval, and maximum execution time; Model Training: After modifying and setting all parameters, the graph structure data based on the SLIC superpixel clustering algorithm is fed into the network for model training. First, the graph structure data is loaded, then the network model and forward propagation process are defined. The GAT convolutional graph attention layer is used for feature extraction and message passing of the graph structure. Next, the loss function and optimizer are defined. Then, the error between the predicted value and the true label value is calculated according to the defined loss function. The model parameters in the forward propagation are updated through the optimization function to reduce the error between the predicted value and the true label value. At the end of the iteration, the model parameters corresponding to the minimum loss value are replaced in the forward propagation to identify the category of the sonar image to be classified. S44: Model Testing and Result Analysis After the GAT model based on sonar images is trained, the weight parameters that have stabilized after training are selected to classify the sonar image categories in the test set. Finally, the recognition accuracy of the test set is used to evaluate the model. That is, in this step, the network model needs to be converted to test mode, Batch Normalization and Dropout in the network model are turned off, and the model is tested using evaluate-network.

5. The sonar image classification method combining SLIC superpixels and graph attention network according to claim 1, characterized in that, Step S5 includes the following steps: S51: Verify the effectiveness of image pre-segmentation and sonar target shadow region information: To verify the effectiveness of image pre-segmentation and sonar target shadow region information, three types of datasets were created. The first type consisted of raw sonar images without image pre-segmentation or target shadow region extraction; the raw images were directly converted into graph-structured data using the SLIC superpixel segmentation algorithm. The second type consisted of raw sonar images that underwent only image pre-segmentation without shadow region extraction; that is, the raw sonar images were preprocessed and pre-segmented before the SLIC clustering algorithm was applied to the pre-segmented images to obtain complete graph-structured data. The third type consisted of raw sonar images that underwent both image pre-segmentation and shadow region extraction; after pre-segmentation, the target bright areas and shadow regions were extracted. The regions are labeled with different colors, and then the SLIC clustering algorithm is used to store target edge information and the correlation between bright areas and shadows from pixel features and spatial features, forming graph structure data with richer attributes. The constructed three types of graph structure datasets are fed into the GAT network for model training, and the curves of the corresponding training loss function changing with the number of iterations are obtained. After training, the data in the test set are used to classify sonar images, and finally the recognition rate under the test set is calculated to obtain the optimal recognition effect. The training curves obtained from the above three types of datasets are compared with the test results to verify the effectiveness of image pre-segmentation and sonar target shadow area information. S52: Test the impact of the relative weight γ between pixel features and spatial location features on the recognition effect: Take different values ​​for γ [0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1] to complete the test of the impact of the relative weight γ between pixel features and spatial location features on the recognition effect; S53: Testing the impact of different attribute calculation methods on recognition performance: To find the most suitable attribute calculation method, four functions for calculating attributes were selected: the Sigmoid function, the Sigmoid function with symmetry, the negative exponent of e, and the direct addition method. The recognition rate under each function was calculated based on the function expression, and the recognition results calculated under various methods were compared to select the optimal attribute calculation method. The four attribute calculation methods are shown in the following formulas: (28) S54: Test the impact of different values ​​of the number of nodes k contained in the neighborhood of a node on the recognition effect: Take the values ​​of k as [5, 10, 20, 30, 50, 100] and compare the training effect and recognition accuracy of the model under different k values, so as to complete the test of the impact of different values ​​of the number of nodes k contained in the neighborhood of a node on the recognition effect.