Insulator appearance defect detection method of convolutional neural network
By introducing convolutional neural networks with HRFPN and SSPCAB modules, the feature extraction and fusion capabilities for insulator defect detection were optimized, solving the problems of missed and false detections in complex backgrounds, and achieving high-precision and efficient insulator appearance defect detection.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- MAINTENANCE BRANCH OF STATE GRID HEBEI ELECTRIC POWER
- Filing Date
- 2026-01-30
- Publication Date
- 2026-06-19
AI Technical Summary
Existing deep learning methods suffer from false positives and false negatives in insulator defect detection, especially in complex environments where it is difficult to effectively distinguish between defects and background noise, resulting in insufficient detection accuracy.
A convolutional neural network is used, combined with a high-resolution feature pyramid network HRFPN and a self-supervised predictive convolutional attention module SSPCAB. Feature extraction and fusion are performed through the spatial channel reconstruction convolutional module SCConv, which optimizes the model's detection capability in complex backgrounds.
It significantly improves the accuracy and robustness of insulator defect detection, maintaining high detection accuracy under complex lighting, angle changes, and strong background interference, and supports real-time deployment and efficient inspection of edge devices.
Smart Images

Figure CN122244493A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of insulator testing technology, and in particular to a method for detecting insulator appearance defects using a convolutional neural network. Background Technology
[0002] In power systems, insulators are critical components ensuring the safe operation of transmission lines. Defects in their appearance (such as cracks, dirt, and breakage) can lead to insulation failure and even power grid accidents. Traditional insulator defect detection relies on manual inspection, which suffers from low efficiency, high subjectivity, and high risks associated with working at heights. With the increasing prevalence of drone inspections, the vast amounts of aerial images urgently require automated inspection technology.
[0003] However, existing deep learning methods face significant challenges in insulator defect detection: first, insulator defects are small and have complex and varied shapes, making them prone to missed or false detections under complex background interference (such as towers, conductors, and the sky); second, existing models lack sufficient feature extraction and fusion capabilities, making it difficult to effectively distinguish defects from background noise, resulting in detection accuracy that fails to meet practical engineering requirements. Therefore, there is an urgent need to develop a high-precision, robust, automated detection method to improve the efficiency and safety of power grid operation and maintenance. Summary of the Invention
[0004] This invention proposes a convolutional neural network method for detecting insulator appearance defects, which solves the problem that in the prior art, insulator defects are small and have complex and varied shapes, making them easy to miss or misdetect under complex background interference.
[0005] To solve the above-mentioned technical problems, the technical solution adopted by the present invention is as follows: A method for detecting appearance defects in insulators using a convolutional neural network includes the following steps: Model training steps: Based on a dataset containing images of insulator appearance defects, the target detection model is trained to obtain a trained model weight file; the target detection model is based on the YOLOv11 architecture, and its neck network uses a high-resolution feature pyramid network HRFPN for multi-scale feature fusion, in which a spatial channel reconstruction convolutional module SCConv is embedded; its backbone network uses a self-supervised predictive convolutional attention module SSPCAB for feature enhancement; Model deployment and detection steps: Deploy the trained model weight file to the edge device, and use the deployed model to automatically identify appearance defects in insulator monitoring videos or images.
[0006] Furthermore, the processing procedure of the self-supervised predictive convolutional attention module (SSPCAB) includes: The input feature map is padded, and the padding size P is calculated according to the formula P=K+D, where K is the convolution kernel size and D is the dilation size; The filled feature map is sliced four times, with the slicing starting points being the upper left corner, upper right corner, lower left corner, and lower right corner of the feature map, respectively. Each slice retains a region of size (WA)×(HA), where W and H are the width and height of the filled feature map, respectively, and A is calculated according to the formula A=K+2×D+1. Perform 1×1 convolution on the four feature maps obtained from the slice; The feature maps after the four convolutions are summed and the ReLU activation function is applied. The fused features are input into the SENet module, where channel attention weights are generated through global pooling, fully connected layers, and the Sigmoid function. The feature maps are then weighted and output.
[0007] Furthermore, the processing procedure of the SENet module is expressed by the following formula: Where X is the input feature map, X' is the output feature map, Pool(·) is the global pooling operation, and W1 and W2 are the parameters of the two fully connected layers, respectively.
[0008] Furthermore, the HRFPN processing procedure includes: Receive feature maps at three different scales from the backbone network; Upsample two of the smaller-scale feature maps to make their resolution similar to that of the largest-scale feature map. Figure 1 To; The three feature maps are concatenated along the channel dimension; Use 1x1 convolution to reduce the dimensionality of the concatenated feature map; Multi-scale feature representations are generated through parallel downsampling branches; The downsampled feature map is input into the SCConv module for processing; The processed multi-scale feature map is then fed into the detection head.
[0009] Furthermore, the processing procedure of the SCConv module includes: After the input feature map is convolved with 1x1, it is divided into two groups along the channel dimension. The first group accounts for 1 / 4 of the total number of channels, and the second group accounts for 3 / 4. The first set of features are processed sequentially by the Spatial Reconstruction Unit (SRU) and the Channel Reconstruction Unit (CRU). The processed first set of features is concatenated with the second set of features; The spliced result is output after passing through a 1×1 convolutional layer.
[0010] Furthermore, the processing procedure of the spatial reconstruction unit (SRU) includes: The input feature map X is grouped and normalized to obtain the normalization result and the variance parameter γ; The spatial weight tensor T is generated using the variance parameter γ; Threshold T to generate binary weight tensors T1 and T2; Multiply T1 and T2 by the input feature map X respectively, and concatenate the results along the channel dimension as the output of SRU.
[0011] Furthermore, the model training steps specifically include: Dataset Construction and Partitioning: Insulator images were collected to form the original dataset, which was then randomly divided into training and test sets in an 8:2 ratio; Data augmentation: Augmentation of training data, including methods such as Mosaic, max pooling, photometric distortion, and geometric distortion; Network training: The enhanced training set data is input into the network, and the network parameters are optimized by sequentially passing through feature extraction, feature fusion and detection head, calculating the loss function and performing error backpropagation. Network validation: Use the test set to validate the model's accuracy and determine whether the model has converged; Iterative training: Repeat the training process 400 times until the model converges.
[0012] Furthermore, the dataset was obtained through drone aerial photography or laboratory photography.
[0013] Furthermore, in the model deployment and detection step, the model is used to automatically identify insulator monitoring videos.
[0014] Furthermore, the dedicated training process in the model training step includes: Images of insulators were collected, and appearance defects in the images were labeled to form the original dataset; The dataset was randomly divided into a training set and a test set in an 8:2 ratio. The training set data is augmented using methods such as Mosaic, max pooling, photometric distortion, and geometric distortion. The expanded training set data is passed forward through the feature extraction part, feature fusion part and detection head of the target detection network in sequence. The loss function is calculated and backpropagation of error is performed to optimize the network parameters. Use a test set to test the accuracy of the network model to determine whether the convergence condition has been met. The training and validation process is repeated for 400 rounds. Once the loss function and average accuracy converge, training is stopped and the final model weight file is obtained.
[0015] The positive effects of this invention are: Feature extraction and fusion optimization: The SCConv module is embedded into the HRFPN structure. Through the separation and reconstruction of spatial and channel information (SRU+CRU), the model's ability to extract and distinguish features of small target defects (such as fine cracks) under complex background interference is significantly enhanced, and the detection accuracy is directly improved. With improved computational efficiency, the SCConv module optimizes the feature processing flow within the HRFPN framework, effectively reducing redundant computations. While maintaining high accuracy, it greatly improves inference speed and supports real-time deployment on edge devices. Accuracy and robustness are improved. Through the synergistic effect of HRFPN-SCConv, the model can still maintain high detection accuracy under complex lighting, angle changes and strong background interference, without relying on expanding the dataset. With strong engineering applicability, the model employs an 8:2 training-test strategy based on UAV aerial photography and laboratory datasets, combined with data augmentation methods such as Mosaic, resulting in excellent model generalization ability and significantly improved inspection efficiency. Attached Figure Description
[0016] Figure 1 This is a schematic diagram of the self-supervised predictive convolutional attention module and the HRFPN neck network structure in a specific embodiment of the present invention; Figure 2 This is a schematic diagram of the SSPCAB module structure in a specific embodiment of the present invention; Figure 3 This is a schematic diagram of the structure of SCConv in a specific embodiment of the present invention. Detailed Implementation
[0017] The technical solutions of the present invention will be clearly and completely described below with reference to the embodiments of the present invention. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative effort are within the scope of protection of the present invention.
[0018] Example A method for detecting appearance defects in insulators using a convolutional neural network includes the following steps: Model training steps: Based on a dataset containing images of insulator appearance defects, the target detection model is trained to obtain a trained model weight file; the target detection model is based on the YOLOv11 architecture, and its neck network uses a high-resolution feature pyramid network HRFPN for multi-scale feature fusion, in which a spatial channel reconstruction convolutional module SCConv is embedded; its backbone network uses a self-supervised predictive convolutional attention module SSPCAB for feature enhancement; Model deployment and detection steps: Deploy the trained model weight file to the edge device, and use the deployed model to automatically identify appearance defects in insulator monitoring videos or images.
[0019] The processing procedure of the self-supervised predictive convolutional attention module (SSPCAB) includes: The input feature map is padded, with the padding size P calculated using the formula P=K+D, where K is the kernel size and D is the expansion size. The padded feature map is then sliced four times, starting from the top-left, top-right, bottom-left, and bottom-right corners of the feature map. Each slice retains a region of size (WA)×(HA), where W and H are the width and height of the padded feature map, respectively, and A is calculated using the formula A=K+2×D+1. The four sliced feature maps are then subjected to 1×1 convolutions. The four convolutionally processed feature maps are summed and activated using the ReLU function. The fused features are then input into the SENet module, where channel attention weights are generated through global pooling, fully connected layers, and the Sigmoid function. These weighted feature maps are then output.
[0020] The processing procedure of the SENet module is expressed by the following formula: Where X is the input feature map, X' is the output feature map, Pool(·) is the global pooling operation, and W1 and W2 are the parameters of the two fully connected layers, respectively.
[0021] The HRFPN processing procedure includes: Receive feature maps at three different scales from the backbone network; upsample two of the smaller scale feature maps so that their resolution is similar to that of the largest scale feature map. Figure 1 The three feature maps are concatenated along the channel dimension; 1x1 convolution is used to reduce the dimensionality of the concatenated feature map; multi-scale feature representation is generated through parallel downsampling branch; the downsampled feature map is input into the SCConv module for processing; and the processed multi-scale feature map is sent to the detection head.
[0022] The processing steps of the SCConv module include: after the input feature map is convolved by 1x1, it is divided into two groups along the channel dimension, with the first group accounting for 1 / 4 of the total number of channels and the second group accounting for 3 / 4; the first group of features is processed by the Spatial Reconstruction Unit (SRU) and the Channel Reconstruction Unit (CRU) in sequence; the processed first group of features is concatenated with the second group of features; and the concatenated result is output after passing through a 1×1 convolutional layer.
[0023] The processing steps of the spatial reconstruction unit (SRU) include: grouping and normalizing the input feature map X to obtain the normalization result and variance parameter γ; generating a spatial weight tensor T using the variance parameter γ; performing thresholding on T to generate binary weight tensors T1 and T2; multiplying T1 and T2 with the input feature map X respectively, and concatenating the results along the channel dimension as the output of the SRU.
[0024] The specific model training steps include: Dataset construction and partitioning: collecting insulator images to form the original dataset, and randomly partitioning it into a training set and a test set at an 8:2 ratio; Data augmentation: augmenting the training set data using methods including Mosaic, max pooling, photometric distortion, and geometric distortion; Network training: inputting the augmented training set data into the network, sequentially performing feature extraction, feature fusion, and detection head, calculating the loss function, and performing error backpropagation to optimize the network parameters; Network validation: using the test set to validate the model's accuracy and determine whether the model has converged; Iterative training: repeating the training process 400 times until the model converges.
[0025] The dataset was obtained through drone aerial photography or laboratory photography.
[0026] In the model deployment and detection step, the model is used to automatically identify insulator monitoring videos.
[0027] The dedicated training process in the model training steps includes: collecting insulator images and labeling appearance defects in the images to form the original dataset; randomly dividing the dataset into training and test sets in an 8:2 ratio; augmenting the training set data using Mosaic, max pooling, photometric distortion, and geometric distortion methods; sequentially passing the augmented training set data through the feature extraction part, feature fusion part, and detection head of the target detection network for forward propagation, calculating the loss function, and performing error backpropagation to optimize network parameters; using the test set to test the accuracy of the network model to determine whether the convergence condition has been met; repeating the training and validation process for 400 rounds, and stopping training and obtaining the final model weight file after the loss function and average accuracy converge.
[0028] Specifically, the insulator appearance defect detection method using convolutional neural networks in this embodiment introduces a self-supervised predictive convolutional attention module and an HRFPN neck network structure to achieve the identification of insulator appearance defects. The network structure is as follows: Figure 1 As shown; the self-supervised predictive convolutional attention module, also known as the SSPCAB module, has the following structure: Figure 2 As shown.
[0029] The SSPCAB module has input and output tensors of the same size, and can learn to predict information using contextual information. First, SSPCAB performs a padding operation on the input feature map or image, as shown in the following equation: Assuming the width of the padded feature map is W and the height is H, a total of 4 slicing operations are performed on the padded feature map: The first slice takes the top left corner of the feature map as the origin and retains a feature map of size (WA) × (HA); The second slice takes the upper right corner of the feature map as the origin and retains a feature map of size (WA) × (HA); The third slice takes the lower left corner of the feature map as the origin and retains a feature map of size (WA) × (HA); The fourth slice takes the bottom right corner of the feature map as the origin and retains a feature map of size (WA) × (HA); The four feature maps obtained from slicing are then passed through a 1x1 convolutional layer. The formula for calculating the slice length A is as follows: The output features of the four convolutional layers after slicing are then summed, and the ReLU activation function is applied. Finally, the fused features are input into SENet.
[0030] First, a squeezing operation is performed, using global pooling to compress the output feature map of the convolutional layer into a feature vector. Then, an activation operation is used to generate a weight vector using a fully connected layer and a non-linear activation function. Finally, the weights i are selected and applied to each channel of the input feature map, and the feature maps are recombined to form the output feature map. The processing of the SE module can be represented as follows: Where X is the input feature map, X' is the output feature map, Pool(·) is the global pooling operation, and W1 and W2 are the parameters of the two fully connected layers in the activation operation, respectively.
[0031] HRFPN Network Structure: After the backbone network generates feature maps, additional feature fusion is needed to improve detection performance. This model uses a standard Feature Pyramid Network (FPN) to aggregate multi-scale features, thereby improving the detection performance of objects of different sizes.
[0032] In insulator appearance inspection applications, most target objects are small in size. A common strategy to improve the recognition rate of small targets is to prioritize processing large-scale feature maps. To address this issue, we introduce HRFPN to reconstruct the neck structure of YOLOv11, achieving more efficient feature channel integration. The final neck architecture is shown below. Figure 1 The neck portion is shown in the image.
[0033] After extracting three feature maps of different scales from the backbone network, HRFPN first upsamples two smaller-scale feature maps to match the resolution of the largest feature map, and then concatenates them along the channel dimension. After concatenation, a 1×1 convolutional layer is used to reduce the number of channels, and a parallel downsampling branch is employed to generate multi-scale representations, enabling the network to capture features within different receptive fields. The downsampled output is processed by the SCConv module to extract channel-level information, and then fed into the detection head; the structure of SCConv is as follows... Figure 3 As shown.
[0034] SCConv optimizes from the spatial dimension, minimizing parameters and computational cost without sacrificing performance. The input feature map is first processed by a 1×1 convolutional layer. The output is split into two segments along the channel dimension: the first quarter of the channels forms the first group, and the remaining three-quarters of the channels form the second group. The first group is processed by a Spatial Reconstruction Unit (SRU) and a Channel Reconstruction Unit (CRU), then concatenated with the second group and passed through another 1×1 convolutional layer. The operation flow of SRU and CRU is as follows: Figure 3 As shown.
[0035] Figure 3 In this context, "SRU weight normalization" refers to the normalization process of the weight parameters γ when the input feature map X is grouped and normalized before being fed into the SRU. Here, γ and α are trainable parameters, and η and σ are... 2 Let X represent the mean and variance of X, respectively, and ε be a constant close to zero. The parameter γ reflects the spatial pixel variance per batch and per channel. After normalization, the value of γ reflects the importance of the feature map between batches and channels. The normalized γ is multiplied by X_gn and then mapped to the (0,1) interval using the Sigmoid function to generate the weight tensor T. A threshold is applied to T: values greater than 0.5 are set to 1 to form the spatial weight tensor T1, and the rest are set to 0 to form T2. These two tensors are multiplied by the input feature map X, respectively, producing a feature map rich in spatial information and a feature map with suppressed spatial information. The two are concatenated along the channel dimension to form the output of the SRU.
[0036] like Figure 3 In the CRU section shown, the two inputs are independently pooled into vectors A1 and A2, which are then concatenated. This combined vector is then processed by channel-level soft attention to generate feature importance vectors β1 and β2.
[0037] The algorithm in this method is based on deep learning. After the model is built in the above manner, the overall image processing process includes two parts: training and testing.
[0038] (1) We collected images of insulators with appearance defects by means of drone aerial photography or laboratory photography. (2) Then, manual or semi-automatic annotation is performed using data annotation software to form the original dataset; (3) The dataset is randomly divided into training and test sets in an 8:2 ratio, and the training data is augmented by methods such as Mosaic, max pooling, photometric distortion and geometric distortion. (4) Next, the prepared training set is passed through the feature extraction part, feature fusion part, and detection head of the constructed object detection network in sequence. During the feature extraction process, the feature maps of the formed image are fused at different scales. Finally, the loss function is calculated for the anchor boxes formed by clustering and the labeled detection boxes, and the error is backpropagated to correct the network parameters. (5) Input the test set into the network to test the accuracy of the network model in order to determine whether the convergence condition has been met.
[0039] (6) After completing steps 4 and 5, the network weights obtained from the first training are fed back into the deep network for repeated training and verification for 400 rounds. Training is stopped after the loss function and average accuracy converge.
[0040] (7) Extract the weight file obtained from the training and deploy it at the edge. The model can then automatically identify the appearance defects of insulators in the monitoring video.
[0041] This method enhances feature extraction and fusion capabilities in complex backgrounds by replacing the original YOLOv11 FPN with HRFPN (High Resolution Feature Pyramid Network) and embedding SCConv (Spatial Channel Reconstruction Convolution) into its structure. Feature representation is optimized through the collaborative optimization of SRU (Spatial Reconstruction Unit) and CRU (Channel Reconstruction Unit), enhancing the representation of small target defects (such as fine cracks). The method also optimizes model lightweighting and computational efficiency by using the SCConv module to separate and reconstruct spatial and channel information processing flows, significantly reducing redundant computation. High-precision real-time detection is achieved. These improvements significantly enhance detection accuracy (especially for small target defects) and inference speed while maintaining high recall, meeting the deployment requirements of edge devices. Furthermore, the method enhances the model's adaptability to complex scenes, enabling it to effectively cope with varying lighting, angles, and background interference in aerial images.
[0042] The above-described embodiments are detailed and specific, illustrating preferred embodiments of the present invention. They are only used to illustrate the technical ideas and features of the present invention, with the aim of enabling those skilled in the art to understand the content of the present invention and implement it accordingly. However, they are not limited to the present invention, and the patent scope of the present invention cannot be limited by this embodiment alone. That is, any equivalent changes or modifications made to the spirit disclosed in the present invention, without departing from the structure of the present invention, such as local improvements within the system and modifications or transformations between subsystems, are still within the patent scope of the present invention.
Claims
1. A method for detecting appearance defects in insulators using a convolutional neural network, characterized in that, Includes the following steps: Model training steps: Based on a dataset containing images of insulator appearance defects, the target detection model is trained to obtain a trained model weight file; the target detection model is based on the YOLOv11 architecture, and its neck network uses a high-resolution feature pyramid network HRFPN for multi-scale feature fusion, in which a spatial channel reconstruction convolutional module SCConv is embedded; its backbone network uses a self-supervised predictive convolutional attention module SSPCAB for feature enhancement; Model deployment and detection steps: Deploy the trained model weight file to the edge device, and use the deployed model to automatically identify appearance defects in insulator monitoring videos or images.
2. The insulator appearance defect detection method using a convolutional neural network according to claim 1, characterized in that, The processing procedure of the self-supervised predictive convolutional attention module (SSPCAB) includes: The input feature map is padded, and the padding size P is calculated according to the formula P=K+D, where K is the convolution kernel size and D is the dilation size; The filled feature map is sliced four times, with the slicing starting points being the upper left corner, upper right corner, lower left corner, and lower right corner of the feature map, respectively. Each slice retains a region of size (WA)×(HA), where W and H are the width and height of the filled feature map, respectively, and A is calculated according to the formula A=K+2×D+1. Perform 1×1 convolution on the four feature maps obtained from the slice; The feature maps after the four convolutions are summed and the ReLU activation function is applied. The fused features are input into the SENet module, where channel attention weights are generated through global pooling, fully connected layers, and the Sigmoid function. The feature maps are then weighted and output.
3. The insulator appearance defect detection method using a convolutional neural network according to claim 2, characterized in that, The processing procedure of the SENet module is expressed by the following formula: Where X is the input feature map, X' is the output feature map, Pool(·) is the global pooling operation, and W1 and W2 are the parameters of the two fully connected layers, respectively.
4. The insulator appearance defect detection method using a convolutional neural network according to claim 1, characterized in that, The HRFPN processing procedure includes: Receive feature maps at three different scales from the backbone network; Upsample two of the smaller-scale feature maps to make their resolution consistent with that of the largest-scale feature map; The three feature maps are concatenated along the channel dimension; Use 1x1 convolution to reduce the dimensionality of the concatenated feature map; Multi-scale feature representations are generated through parallel downsampling branches; The downsampled feature map is input into the SCConv module for processing; The processed multi-scale feature map is then fed into the detection head.
5. A method for detecting insulator appearance defects using a convolutional neural network according to claim 1 or 4, characterized in that, The processing procedure of the SCConv module includes: After the input feature map is convolved with 1x1, it is divided into two groups along the channel dimension. The first group accounts for 1 / 4 of the total number of channels, and the second group accounts for 3 / 4. The first set of features are processed sequentially by the Spatial Reconstruction Unit (SRU) and the Channel Reconstruction Unit (CRU). The processed first set of features is concatenated with the second set of features; The spliced result is output after passing through a 1×1 convolutional layer.
6. The insulator appearance defect detection method using a convolutional neural network according to claim 5, characterized in that, The processing procedure of the Spatial Reconstruction Unit (SRU) includes: The input feature map X is grouped and normalized to obtain the normalization result and the variance parameter γ; The spatial weight tensor T is generated using the variance parameter γ; Threshold T to generate binary weight tensors T1 and T2; Multiply T1 and T2 by the input feature map X respectively, and concatenate the results along the channel dimension as the output of SRU.
7. The insulator appearance defect detection method using a convolutional neural network according to claim 1, characterized in that, The model training steps specifically include: Dataset Construction and Partitioning: Insulator images were collected to form the original dataset, which was then randomly divided into training and test sets in an 8:2 ratio; Data augmentation: Augmentation of training data, including methods such as Mosaic, max pooling, photometric distortion, and geometric distortion; Network training: The enhanced training set data is input into the network, and the network parameters are optimized by sequentially passing through feature extraction, feature fusion and detection head, calculating the loss function and performing error backpropagation. Network validation: Use the test set to validate the model's accuracy and determine whether the model has converged; Iterative training: Repeat the training process 400 times until the model converges.
8. The insulator appearance defect detection method using a convolutional neural network according to claim 7, characterized in that, The dataset was obtained through drone aerial photography or laboratory photography.
9. The insulator appearance defect detection method using a convolutional neural network according to claim 1, characterized in that, In the model deployment and detection step, the model is used to automatically identify insulator monitoring videos.
10. The insulator appearance defect detection method using a convolutional neural network according to claim 1, characterized in that, The dedicated training process in the model training steps includes: Images of insulators were collected, and appearance defects in the images were labeled to form the original dataset; The dataset was randomly divided into a training set and a test set in an 8:2 ratio. The training set data is augmented using methods such as Mosaic, max pooling, photometric distortion, and geometric distortion. The expanded training set data is passed forward through the feature extraction part, feature fusion part and detection head of the target detection network in sequence. The loss function is calculated and backpropagation of error is performed to optimize the network parameters. Use a test set to test the accuracy of the network model to determine whether the convergence condition has been met. The training and validation process is repeated for 400 rounds. Once the loss function and average accuracy converge, training is stopped and the final model weight file is obtained.