A ConvNeXt model SAR ship classification method based on multi-level feature collaborative interaction
By introducing the ConvNeXt model, which features multi-level collaborative interaction, into SAR ship classification, and combining traditional handmade features with a multi-level branch structure, the problem of insufficient feature mining and fusion in existing models is solved, and efficient and accurate ship classification is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- HARBIN ENG UNIV
- Filing Date
- 2025-04-01
- Publication Date
- 2026-06-23
Smart Images

Figure CN120388301B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of target classification technology in synthetic aperture radar (SAR), and in particular to a ConvNeXt model SAR ship classification method based on multi-level feature collaborative interaction. Background Technology
[0002] Synthetic Aperture Radar (SAR) is an advanced remote sensing technology that achieves precise imaging of the land or ocean surface by emitting high-frequency electromagnetic waves and receiving their reflected signals. Compared to traditional optical imaging technologies, SAR possesses all-weather, all-time high-resolution imaging capabilities, playing a crucial role in marine environmental monitoring and resource management. Unaffected by weather and lighting conditions, it can continuously track and monitor marine targets, providing key technical support for the construction of intelligent marine management systems. Furthermore, SAR's advantages in intelligent ship information identification and classification help improve the efficiency of maritime traffic management and promote the sustainable development of the blue economy.
[0003] In recent years, with the rapid development of deep learning technology, its application in SAR ship target classification has become a research hotspot, but it also faces many challenges. First, existing deep learning-based SAR ship classification models mainly rely on high-level features automatically extracted by neural networks, often neglecting traditional handcrafted features rich in expert experience, making it difficult to fully optimize model training performance. Second, mainstream convolutional neural networks (CNNs) typically extract features from shallow to deep layers through multiple convolutions, but their feature mining and fusion at different scales remain insufficient, limiting further improvement in classification accuracy. Therefore, how to effectively combine traditional handcrafted features and fully utilize multi-scale feature information has become an important research topic for improving SAR ship classification performance. Summary of the Invention
[0004] The purpose of this invention is to solve the problems in the prior art and propose a SAR ship classification method based on the ConvNeXt model of multi-level feature collaborative interaction.
[0005] This invention is achieved through the following technical solution: This invention proposes a SAR ship classification method based on the ConvNeXt model of multi-level feature collaborative interaction, the method comprising the following steps:
[0006] Step 1: Process the current input SAR ship sample using the Canny method, perform traditional manual feature extraction, and generate the corresponding image with edge information;
[0007] Step 2: Extract features from the generated image with edge information, and select the ConvNeXt model as the feature extraction network; retain features at different scales during the convolution process for subsequent feature cross-fusion.
[0008] Step 3: Extract features from the current input SAR ship sample, selecting the ConvNeXt model as the feature extraction network; after a multi-level branch structure and feature cross-fusion structure, the input image is transformed into a feature vector, which combines multi-dimensional and multi-scale feature information;
[0009] Step 4: Use the classification head to perform multi-task regression on the features and assign weight coefficients to adapt to the SAR ship classification scenario, and finally obtain the classification result.
[0010] Furthermore, the dataset used in step 1 is the FUSAR-Ship dataset, which is augmented with images to expand the dataset.
[0011] Further, in step 1, edge point judgment is performed. The specific judgment method is as follows: if the amplitude of a pixel is greater than the high threshold, the pixel is retained as an edge point; if the amplitude of a pixel is less than the low threshold, the pixel is excluded; if the amplitude of a pixel is between the two thresholds, the pixel is retained as a possible edge point; finally, all the final edge points are connected.
[0012] Furthermore, step 2 uses the ConvNeXt model to extract features from the edge information image, which consists of 4 stages. Each stage is connected by a downsampling layer, thereby preserving features at 4 different scales for subsequent feature cross-fusion structures.
[0013] Furthermore, the ConvNeXt model is composed of ConvNeXt Blocks, whose structure is depthwise separable convolution. Depthwise separable convolution is divided into two processes: channel-wise convolution and pointwise convolution.
[0014] Channel-wise convolution performs convolution operations independently on each input channel; assuming the input feature map is... Convolution kernel is The channel-wise convolution operation can then be represented as:
[0015]
[0016] Among them, Y d is the output after channel-wise convolution, i and j are the spatial locations of the output feature map, and c is the channel index; this process is performed independently for each channel, so each channel uses a different convolution kernel for convolution;
[0017] Pointwise convolution is a 1×1 convolution operation that fuses information along the depth direction; assuming the output feature map, after pointwise convolution, has a size of... Convolution kernel is The pointwise convolution operation can then be represented as:
[0018]
[0019] Among them, Y p It is the output after pointwise convolution, c′ is the output channel index, and C out It represents the number of output channels; pointwise convolution combines all channel information at each position into new channel information.
[0020] Furthermore, step 3 uses the ConvNeXt model to extract features from the original image, which consists of four stages with multi-level branch structures. During the feature depth extraction process in each branch, feature cross-fusion is performed.
[0021] Furthermore, a multi-level branching structure is added to the ConvNeXt model in step 3 to process features at different scales, allowing high-scale and low-scale feature maps to interact effectively at each layer. Feature maps of different scales exchange information through a fusion module to ensure that high-scale feature maps maintain clarity throughout the network, while low-scale feature maps obtain sufficient contextual information at deeper layers. Assuming that at layer l, the input feature map... Processing at different resolutions can be represented as:
[0022]
[0023] Where Fuse(·) represents the feature fusion operation;
[0024] Simultaneously, during the extraction of feature maps at each scale, cross-fusion with edge information image features is performed; at a certain scale, the input feature maps are respectively and The cross-fusion operation is then performed as follows:
[0025]
[0026] in, The features after fusion, and
[0027] Furthermore, in step 4, cross-entropy loss function and label smoothing are used to prevent overfitting.
[0028] The formula for the cross-entropy loss function is as follows:
[0029]
[0030] Where, x i It is the result of the model output after softmax, y i This indicates whether it corresponds to a category label. A label smoothing method is applied to the category labels, which alters the probability distribution. The specific formula is as follows:
[0031]
[0032] Where ε is a constant.
[0033] The present invention also proposes an electronic device, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the steps of the ConvNeXt model SAR ship classification method based on multi-level feature collaborative interaction.
[0034] The present invention also proposes a computer-readable storage medium for storing computer instructions, which, when executed by a processor, implement the steps of the ConvNeXt model SAR ship classification method based on multi-level feature collaborative interaction.
[0035] Compared with the prior art, the beneficial effects of the present invention are as follows:
[0036] This invention provides a ConvNeXt model based on multi-level feature collaborative interaction for ship classification in SAR images. This method obtains traditional hand-crafted feature images through edge detection, which are then input together with the original image into the ConvNeXt model with added multi-level branches for feature extraction. Furthermore, cross-fusion of features is performed, thereby improving the model's representational capability. This achieves efficient and accurate ship classification in SAR images. Attached Figure Description
[0037] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on the provided drawings without creative effort.
[0038] Figure 1 This is a flowchart of a ConvNeXt model SAR ship classification method based on multi-level feature collaborative interaction as described in this invention.
[0039] Figure 2 This is a structural framework diagram of the ConvNeXt model.
[0040] Figure 3This is a schematic diagram of the image generated by edge feature extraction in the embodiment. (a) is the input image, and (b) is the extraction result.
[0041] Figure 4 The diagram shows the classification results in the example, where (a) and (b) are classification results formed by different input images. Detailed Implementation
[0042] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0043] To address the problems in existing technologies, this invention proposes a ConvNeXt model based on multi-level feature collaborative interaction for ship classification in SAR images. To effectively utilize traditional handcrafted features, a Canny-based edge detection method is used, employing multi-level filtering and edge gradient analysis to capture edge information of objects in the image. The results are then input into the ConvNeXt model for feature extraction. Furthermore, a multi-level branching structure is extended into the ConvNeXt model for in-depth extraction of feature information at different scales, thereby simultaneously capturing multiple information from the image at both low-level and high-level features. Finally, cross-fusion with traditional handcrafted features is performed during multi-scale feature extraction to improve the model's classification accuracy.
[0044] Specifically, in combination Figures 1-4 This invention proposes a SAR ship classification method based on the ConvNeXt model with multi-level feature collaborative interaction. The method includes the following steps:
[0045] Step 1: Process the current input SAR ship sample using the Canny method, perform traditional manual feature extraction, and generate the corresponding image with edge information;
[0046] Step 1 uses the FUSAR-Ship dataset. Considering the imbalance of samples across categories, the dataset undergoes image augmentation to expand the sample size. Next, the dataset is divided into training, validation, and test sets in a 7:2:1 ratio. Finally, the training parameters are set.
[0047] In step 1, edge point judgment is performed. The specific judgment method is as follows: if the amplitude of a pixel is greater than the high threshold, the pixel is retained as an edge point; if the amplitude of a pixel is less than the low threshold, the pixel is excluded; if the amplitude of a pixel is between the two thresholds, the pixel is retained as a possible edge point; finally, all the final edge points are connected.
[0048] Step 2: Extract features from the generated image with edge information, and select the ConvNeXt model as the feature extraction network; retain features at different scales during the convolution process for subsequent feature cross-fusion.
[0049] Step 2 uses the ConvNeXt model to extract features from the edge information image. It consists of 4 stages, with each stage connected by a downsampling layer, thus preserving features at 4 different scales for subsequent feature cross-fusion structures.
[0050] The ConvNeXt model consists of ConvNeXt Blocks, and its structure is depthwise separable convolution. Depthwise separable convolution is divided into two processes: channel-wise convolution and point-wise convolution.
[0051] Channel-wise convolution performs convolution operations independently on each input channel; assuming the input feature map is... Convolution kernel is The channel-wise convolution operation can then be represented as:
[0052]
[0053] Among them, Y d is the output after channel-wise convolution, i and j are the spatial locations of the output feature map, and c is the channel index; this process is performed independently for each channel, so each channel uses a different convolution kernel for convolution;
[0054] Pointwise convolution is a 1×1 convolution operation that fuses information along the depth direction; assuming the output feature map, after pointwise convolution, has a size of... Convolution kernel is The pointwise convolution operation can then be represented as:
[0055]
[0056] Among them, Y p It is the output after pointwise convolution, c′ is the output channel index, and C out It represents the number of output channels; pointwise convolution combines all channel information at each position into new channel information.
[0057] Step 3: Extract features from the current input SAR ship sample, selecting the ConvNeXt model as the feature extraction network; after a multi-level branch structure and feature cross-fusion structure, the input image is transformed into a feature vector, which combines multi-dimensional and multi-scale feature information;
[0058] Step 3 uses the ConvNeXt model to extract features from the original image. It consists of four stages with multi-level branching structures. During the feature depth extraction process in each branch, feature cross-fusion is performed.
[0059] In step 3, a multi-level branching structure is added to the ConvNeXt model to process features at different scales. High-scale feature maps and low-scale feature maps can interact effectively at each layer. Feature maps of different scales exchange information through a fusion module to ensure that high-scale feature maps maintain clarity throughout the network, while low-scale feature maps obtain sufficient contextual information in deeper layers of the network. Assuming that in layer l, the input feature map... Processing at different resolutions can be represented as:
[0060]
[0061] Where Fuse(·) represents the feature fusion operation;
[0062] Simultaneously, during the extraction of feature maps at each scale, cross-fusion with edge information image features is performed; at a certain scale, the input feature maps are respectively and The cross-fusion operation is then performed as follows:
[0063]
[0064] in, The features after fusion, and
[0065] Step 4: Use the classification head to perform multi-task regression on the features and assign weight coefficients to adapt to the SAR ship classification scenario, and finally obtain the classification result.
[0066] In step 4, cross-entropy loss function and label smoothing are used to prevent overfitting.
[0067] The formula for the cross-entropy loss function is as follows:
[0068]
[0069] Where, x i It is the result of the model output after softmax, y iThis indicates whether it corresponds to a category label. A label smoothing method is applied to the category labels, which alters the probability distribution. The specific formula is as follows:
[0070]
[0071] Where ε is a constant.
[0072] Example
[0073] The purpose of this invention is to solve the problem of SAR ship classification. It utilizes deep learning networks to efficiently and accurately identify SAR ship targets and output corresponding category information. To achieve this goal, this invention provides a SAR ship classification method based on the ConvNeXt model of multi-level feature collaborative interaction. Its basic process is as follows: Figure 1 As shown, the method includes:
[0074] Step 1: Process the current input SAR ship sample using the Canny method, perform traditional manual feature extraction, and generate the corresponding image with edge information.
[0075] Step 1 uses the FUSAR-Ship dataset, selecting six ship types: cargo ships, bulk carriers, container ships, fishing vessels, tugboats, and tankers. After image augmentation of the existing samples, approximately 1600 images are added to each ship type. Next, the dataset is divided: the training set comprises 70% of the total images, the validation set comprises 20% (randomly generated for both training and testing), and the test set comprises 10%. During training, the input images are fixed at 512×512 pixels. The training batch size is 8, and the number of training iterations is 500.
[0076] Before edge detection, the image needs to be converted to grayscale and filtered. Grayscale conversion simplifies subsequent image processing, reducing image complexity and the amount of information processed. It is obtained by multiplying the three channels (B, G, R) of the original image by certain weights and then summing them. Grayscale images can represent most of the image features with less data information.
[0077] To determine whether a pixel has a local maximum along its gradient direction and perform non-maximum suppression, a Sobel filter is used to calculate the gradient magnitude and direction at that point. The derivative is then calculated using a pair of convolutional arrays (in the x and y directions).
[0078]
[0079] The following formulas are used to calculate both the magnitude and direction of the gradient.
[0080]
[0081] To facilitate judgment and improve efficiency, the gradient direction θ is generally selected from 0°, 45°, 90°, and 135°. In the current task, this invention selects 0°.
[0082] Next, non-maximum suppression is performed, which involves comparing the point with its two neighboring points along the gradient direction. If the point has a local maximum, it is marked as a potential edge point. This eliminates non-edge pixels, retaining only some candidate edges. However, this still includes many false edges caused by noise and other factors. Therefore, a double-threshold lag threshold is needed to reduce the number of false edges. The specific edge point determination method is as follows: if the amplitude of a pixel is greater than the high threshold, the pixel is retained as an edge point. If the amplitude of a pixel is less than the low threshold, the pixel is excluded. If the amplitude of a pixel is between the two thresholds, the pixel is retained as a potential edge point. Finally, all the final edge points are connected.
[0083] Step 2: Feature extraction is performed on the generated image with edge information, using the ConvNeXt model as the feature extraction network. Features at different scales are preserved during the convolutional process for subsequent feature fusion.
[0084] Step 2 uses the ConvNeXt model, a neural network architecture inspired by the Transformer but still employing convolutions. This model consists of ConvNeXt Blocks, the most important structure of which is depthwise separable convolution. Depthwise separable convolution mainly involves two processes: channel-wise convolution and pointwise convolution.
[0085] Channel-wise convolution performs convolution operations independently on each input channel. Assume the input feature map is... Convolution kernel is The channel-wise convolution operation can then be represented as:
[0086]
[0087] Among them, Y d This is the output after channel-wise convolution, where i and j are the spatial locations of the output feature map, and c is the channel index. This process is performed independently for each channel, so each channel uses a different convolution kernel.
[0088] Pointwise convolution is a 1×1 convolution operation that fuses information along the depth direction. Assume the output feature map, after pointwise convolution, has a size of... Convolution kernel is The pointwise convolution operation can then be represented as:
[0089]
[0090] Among them, Y p It is the output after pointwise convolution, c′ is the output channel index, and C out This refers to the number of output channels. Pointwise convolution combines all channel information at each position into new channel information.
[0091] Step 3: Feature extraction is performed on the current input SAR ship sample, using the ConvNeXt model as the feature extraction network. After a multi-level branching structure and a feature cross-fusion structure, the input image is transformed into a feature vector, which combines multi-dimensional and multi-scale feature information.
[0092] Step 3 improves upon the traditional ConvNeXt Stake by adding a multi-level branching structure to process features at different scales, enabling effective interaction between high-scale and low-scale feature maps at each layer. Feature maps of different scales exchange information through a fusion module to ensure that high-scale feature maps maintain clarity throughout the network, while low-scale feature maps acquire sufficient contextual information at deeper layers. Assuming at layer l, the input feature map... Processing at different resolutions can be represented as:
[0093]
[0094] Fuse(·) represents a more efficient feature fusion operation, such as convolution, upsampling, weighted fusion, etc.
[0095] Simultaneously, during the extraction of feature maps at each scale, cross-fusion with edge information image features is performed to further improve the expressive power of multi-resolution features. At a certain scale, the input feature maps are respectively and The cross-fusion operation is then performed as follows:
[0096]
[0097] in, The features after fusion, and
[0098] Step 4: Use the classification head to perform multi-task regression on the features and assign weight coefficients to adapt to the SAR ship classification scenario, and finally obtain the classification result.
[0099] Step 4 employs cross-entropy loss and label smoothing to prevent overfitting.
[0100] The cross-entropy loss function is a common loss function in classification and recognition tasks, and its formula is as follows:
[0101]
[0102] Where, x i It is the result of the model output after softmax, y i The following formula can be used to indicate whether something is a corresponding category label:
[0103]
[0104] This process leads to the neglect of the relationship between true labels and other labels, making the model susceptible to problems when dealing with classification and identification of SAR ship datasets with high sample similarity and significant data noise. Therefore, a label smoothing method was subsequently used, which altered the probability distribution.
[0105]
[0106] Here, ε is a small constant, which makes the probability optimization objective in the softmax loss no longer 1 and 0. This avoids overfitting to some extent and also alleviates the impact of mislabeling.
[0107] This invention proposes a ConvNeXt model based on multi-level feature collaborative interaction for ship classification in SAR images. This method obtains traditional hand-crafted feature images through edge detection, which are then input together with the original image into the ConvNeXt model with added multi-level branches for feature extraction. Furthermore, cross-fusion of features is performed, thereby improving the model's representational capability. This achieves efficient and accurate ship classification in SAR images.
[0108] The present invention also proposes an electronic device, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the steps of the SAR ship classification method based on the ConvNeXt model of multi-level feature collaborative interaction.
[0109] The present invention also proposes a computer-readable storage medium for storing computer instructions, which, when executed by a processor, implement the steps of the ConvNeXt model SAR ship classification method based on multi-level feature collaborative interaction.
[0110] The memory in this application embodiment can be volatile memory or non-volatile memory, or it can include both volatile and non-volatile memory. The non-volatile memory can be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), or flash memory. The volatile memory can be random access memory (RAM), which is used as an external cache. By way of example, but not limitation, many forms of RAM are available, such as static random access memory (SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (SDRAM), double data rate synchronous dynamic random access memory (DDRSDRAM), enhanced synchronous dynamic random access memory (ESDRAM), synchronous linked dynamic random access memory (SLDRAM), and direct rambus RAM (DRRAM). It should be noted that the memory used in the methods described in this invention is intended to include, but is not limited to, these and any other suitable types of memory.
[0111] In the above embodiments, implementation can be achieved, in whole or in part, through software, hardware, firmware, or any combination thereof. When implemented in software, it can be implemented, in whole or in part, as a computer program product. The computer program product includes one or more computer instructions. When the computer instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of this application are generated. The computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions can be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another. For example, the computer instructions can be transmitted from one website, computer, server, or data center to another via wired (e.g., coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium can be any available medium accessible to a computer or a data storage device such as a server or data center that integrates one or more available media. The available media may be magnetic media (e.g., floppy disks, hard disks, magnetic tapes), optical media (e.g., high-density digital video discs (DVDs)), or semiconductor media (e.g., solid-state disks (SSDs)).
[0112] In implementation, each step of the above method can be completed by integrated logic circuits in the processor's hardware or by instructions in software. The steps of the method disclosed in the embodiments of this application can be directly implemented by a hardware processor, or by a combination of hardware and software modules in the processor. The software modules can reside in random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, registers, or other mature storage media in the art. This storage medium is located in memory, and the processor reads information from the memory and, in conjunction with its hardware, completes the steps of the above method. To avoid repetition, detailed descriptions are omitted here.
[0113] It should be noted that the processor in the embodiments of this application can be an integrated circuit chip with signal processing capabilities. During implementation, each step of the above method embodiments can be completed by the integrated logic circuitry in the processor's hardware or by instructions in software form. The processor can be a general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components. It can implement or execute the methods, steps, and logic block diagrams disclosed in the embodiments of this application. The general-purpose processor can be a microprocessor or any conventional processor. The steps of the methods disclosed in the embodiments of this application can be directly embodied as being executed by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor. The software modules can be located in random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, registers, or other mature storage media in the art. This storage medium is located in memory, and the processor reads the information in the memory and, in conjunction with its hardware, completes the steps of the above methods.
[0114] The foregoing has provided a detailed description of the ConvNeXt model SAR ship classification method based on multi-level feature collaborative interaction proposed in this invention. Specific examples have been used to illustrate the principles and implementation methods of this invention. The descriptions of the above embodiments are only for the purpose of helping to understand the method and core ideas of this invention. At the same time, for those skilled in the art, there will be changes in the specific implementation methods and application scope based on the ideas of this invention. Therefore, the content of this specification should not be construed as a limitation of this invention.
Claims
1. A ConvNeXt model SAR ship classification method based on multi-level feature cooperative interaction, characterized in that, The method includes the following steps: Step 1: Process the current input SAR ship sample using the Canny method, perform traditional manual feature extraction, and generate the corresponding image with edge information; Step 2: Extract features from the generated image with edge information, and select the ConvNeXt model as the feature extraction network; retain features at different scales during the convolution process for subsequent feature cross-fusion. Step 3: Extract features from the current input SAR ship sample, selecting the ConvNeXt model as the feature extraction network; after a multi-level branch structure and feature cross-fusion structure, the input image is transformed into a feature vector, which combines multi-dimensional and multi-scale feature information; Step 3 uses the ConvNeXt model to extract features from the original image. It consists of four stages with multi-level branching structures. During the feature depth extraction process in each branch, feature cross-fusion is performed. The multi-level branch structure is added in the ConvNeXt model in step 3 to process features at different scales, and high-scale feature maps and low-scale feature maps can effectively interact at each layer; the feature maps at different scales exchange information through the fusion module to ensure that the high-scale feature maps maintain clarity in the entire network, and the low-scale feature maps obtain sufficient context information in the deep layer of the network; in the input feature map processing at different resolutions is represented as: in, Indicates feature fusion operation; Simultaneously, during the extraction of feature maps at each scale, cross-fusion with edge information image features is performed; at a certain scale, the input feature maps are respectively and The cross-fusion operation is then: in, The features after fusion, and ; Step 4: Use the classification head to perform multi-task regression on the features and assign weight coefficients to adapt to the SAR ship classification scenario, and finally obtain the classification result.
2. The method according to claim 1, characterized in that, The dataset used in step 1 is the FUSAR-Ship dataset. The dataset is augmented with images to expand the sample size.
3. The method according to claim 1, characterized in that, In step 1, edge point judgment is performed. The specific judgment method is as follows: if the amplitude of a pixel is greater than the high threshold, the pixel is retained as an edge point; if the amplitude of a pixel is less than the low threshold, the pixel is excluded; if the amplitude of a pixel is between the two thresholds, the pixel is retained as a possible edge point; finally, all the final edge points are connected.
4. The method according to claim 1, characterized in that, Step 2 uses the ConvNeXt model to extract features from the edge information image. It consists of 4 stages, with each stage connected by a downsampling layer, thus preserving features at 4 different scales for subsequent feature cross-fusion structures.
5. The method according to claim 4, characterized in that, The ConvNeXt model consists of ConvNeXt Blocks, and its structure is depthwise separable convolution. Depthwise separable convolution is divided into two processes: channel-wise convolution and point-wise convolution. Channel-wise convolution is a convolution operation performed independently on each input channel; the input feature map is... The convolution kernel is The channel-wise convolution operation is represented as: in, It is the output after channel-wise convolution. and It is the spatial location of the output feature map. It is a channel index; this process is performed independently for each channel, so each channel uses a different convolution kernel for convolution; Pointwise convolution is a 1×1 convolution operation that fuses information along the depth direction; the output feature map after pointwise convolution has a size of [size missing]. The convolution kernel is Then the pointwise convolution operation is represented as: in, It is the output after pointwise convolution. It is the output channel index. It represents the number of output channels; pointwise convolution combines all channel information at each position into new channel information.
6. The method according to claim 1, characterized in that, In step 4, cross-entropy loss function and label smoothing are used to prevent overfitting. The formula for the cross-entropy loss function is as follows: in, It is the result of the model output after softmax. This indicates whether it corresponds to a category label. A label smoothing method is applied to the category labels, which alters the probability distribution. The specific formula is as follows: in, It is a constant.
7. An electronic device comprising a memory and a processor, wherein the memory stores a computer program, characterized in that, When the processor executes the computer program, it implements the steps of the method according to any one of claims 1-6.
8. A computer-readable storage medium for storing computer instructions, characterized in that, When the computer instructions are executed by the processor, they implement the steps of the method according to any one of claims 1-6.