Multi-defect category insulator defect detection method based on multi-angle feature enhancement

By employing a multi-angle feature enhancement detection method, combined with channel and spatial attention mechanisms, the problem of information loss in insulator defect detection is solved, achieving higher accuracy and speed in detection.

CN118469946BActive Publication Date: 2026-06-19XINJIANG UNIVERSITY

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
XINJIANG UNIVERSITY
Filing Date
2024-05-14
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing insulator defect detection methods suffer from information loss during feature extraction and fusion, and the loss function does not pay enough attention to the quality of the anchor frame, resulting in insufficient detection accuracy and speed.

Method used

A detection method with multi-angle feature enhancement is adopted, which combines channel attention and spatial attention mechanisms. Through a multi-scale bidirectional feature fusion structure and an optimized loss function, feature extraction and fusion are enhanced, detailed information is preserved, and anchor box quality is optimized.

🎯Benefits of technology

It improves the accuracy and speed of insulator defect detection, reduces information loss, enhances the detection capability for small targets, and optimizes the overall performance of the detector.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN118469946B_ABST
    Figure CN118469946B_ABST
Patent Text Reader

Abstract

The application discloses a multi-defect category insulator defect detection method based on multi-angle feature enhancement, belongs to the technical field of artificial intelligence, and comprises the following steps: (1) image feature extraction is performed by a designed network architecture; (2) multi-layer feature maps are fused; (3) network optimization is performed by using a loss function; and (4) a detection result is outputted.The multi-defect category insulator defect detection method based on multi-angle feature enhancement has the advantages that the channel features are considered, the spatial features are further enhanced, a new multi-scale bidirectional feature fusion structure is constructed, more context information is reserved in the fusion process, and the fusion speed is accelerated.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of artificial intelligence technology, and in particular to a method for detecting multi-defect categories of insulator defects based on multi-angle feature enhancement. Background Technology

[0002] Insulator damage can severely impact the safe operation of power systems. Therefore, timely and accurate identification of insulator defects is crucial. The complexity of image backgrounds and the diversity of insulator defect types, coupled with the impact of low-quality datasets, further complicate insulator defect detection. Furthermore, faults often exhibit dense distribution, and the small size of insulators makes detection even more challenging. Early manual methods for identifying insulator defects were prone to errors and omissions, and their time-consuming nature made them unsuitable for large-scale power line inspections. Image processing-based methods are only effective for specific defect categories. With the rise of convolutional neural networks, deep learning-based insulator defect detection methods have become mainstream. To improve detection accuracy, more robust feature extraction networks and Feature Processing Networks (FPNs) are typically used. However, more robust feature extraction networks require larger parameters, and unidirectional fusion in FPNs can lose significant locational information. Therefore, effectively extracting insulator defect features and fully integrating contextual information are essential for accurate insulator defect identification.

[0003] Currently, the mainstream insulator defect detection framework based on fusion attention mechanisms has certain shortcomings, mainly consisting of three aspects:

[0004] (1) Ineffective extraction of semantic features. Feature extraction of targets has always been of paramount importance in the target detection process. However, the multi-layer convolutional operations in most feature extraction networks will destroy long-range dependencies along the spatial dimension, resulting in the loss of feature information of small targets;

[0005] (2) Semantic information is lost during feature fusion. Since the size and position of the target in the image are uncertain, a mechanism is needed to handle targets of different scales and sizes. Feature pyramid is a structure for handling multi-scale target detection. It can be implemented by adding feature layers of different scales to the backbone network. However, small objects have less pixel information and are easily lost during downsampling.

[0006] (3) Loss Function Issues. The loss function for object detection is generally a multi-task loss function, including classification loss, localization loss, and confidence loss. Classification loss calculates whether the anchor box and its corresponding label are correctly classified; confidence loss calculates the network's confidence; and localization loss calculates the error between the predicted and ground truth boxes. However, existing methods generally consider the aspect ratio of the anchor box in their localization loss, neglecting the quality of the anchor box. Low-quality anchor boxes in the dataset can significantly impact the loss, while high-quality samples are difficult to optimize further. Therefore, reducing the competitiveness of high-quality anchor boxes while minimizing the harmful gradients generated by low-quality samples is crucial. Summary of the Invention

[0007] To address the shortcomings of the existing technology, the present invention aims to provide a multi-defect category insulator defect detection method based on multi-angle feature enhancement. The feature fusion network of this model combines an attention mechanism and a feature pyramid structure, which can avoid information loss during feature fusion and retain more detailed information, thereby improving the accuracy of insulator defect detection.

[0008] The technical solution adopted by this invention to solve its technical problem is as follows:

[0009] A method for detecting multi-defect categories insulator defects based on multi-angle feature enhancement is provided, including the following steps:

[0010] (1) Input image dataset and corresponding label dataset: First, the images are stitched together using data augmentation methods so that the model can identify targets in a smaller range; then, a set of pre-defined anchor boxes are used. During training, the model adaptively adjusts the size of the anchor boxes. Through training and evaluation of supervised learning tasks, a neural network model that can accurately identify image defect categories is trained.

[0011] (2) Feature extraction: First, the feature map is obtained by the convolution operation of the CBS layer. The CBS structure includes a convolutional layer, a normalization layer and an activation function layer. Then, the feature map is extracted by downsampling, then by the CSP structure, and finally by the spatial pyramid pooling structure.

[0012] (3) Feature fusion: By upsampling and downsampling operations, feature maps at different levels are fused together, and the diversity and robustness of features are improved by using a multi-scale bidirectional feature fusion structure.

[0013] (4) Output results: The trained neural network is used to process the input image containing insulator defects. The image to be detected is divided into H×W networks. If the center of a target falls in this network, then this network is responsible for predicting the target. The category information predicted by each network is multiplied by the confidence score of the target box prediction to obtain the category confidence score of each target box. Then, the target boxes with low category confidence scores are filtered out, and the remaining target boxes are subjected to non-maximum suppression processing to obtain the final detection results.

[0014] Further, in step (2), the last layer feature map of the input image, 20×20×512, is first reduced in channel dimension by 1×1 convolution, then upsampled and stacked with the 40×40×256 feature map. The result of the residual connection is convolved through the CSP layer to retain effective information. Then, the 40×40×256 feature map is convolved by 1×1 and upsampled and then stitched with the 80×80×128 feature map. The result of the residual connection is passed through the first output feature fusion layer obtained by CSP convolution. The first output feature fusion layer is downsampled and stitched with the previous feature map of the same size. Then, CSP convolution is performed again to output the second feature fusion layer. The second output layer is processed according to the above operation to obtain the third output layer.

[0015] Furthermore, in step (3), a new MAFE+X structure is constructed. MAFE+X consists of multiple CBS modules and multi-angle feature enhancement modules, which can further enhance the key information in the feature map.

[0016] The multi-angle feature enhancement module includes a channel attention module and a triple spatial attention module.

[0017] Furthermore, using χ∈R C×H×W The input feature map is represented by W, and the channel attention weights are expressed in terms of W. c ∈R C×1×1 This indicates that the channel attention has two branches. The input feature map χ represents the average pooling and max pooling operations performed in the channel dimension, respectively, resulting in two one-dimensional feature maps L1, L2∈R. C×1×1 These feature maps then pass through a multilayer perceptron, and are finally normalized using an activation function.

[0018] The mathematical expression for channel attention weights is as follows:

[0019] W c (x)=σ(f(ζ(χ))+f(ξ(χ)))

[0020] Where ζ() and ξ() represent average pooling and max pooling operations on the input, respectively, f() represents the multilayer perceptron operation, and σ() represents the activation function.

[0021] Furthermore, spatial attention weights are calculated using W. s ∈R 1×H×W This means that by performing average pooling and max pooling operations in the spatial dimension, two two-dimensional feature maps J1, J2∈R are obtained. 1×H×W After concatenation, the spatial attention weights are obtained through a 7×7 convolutional kernel and an activation function. The mathematical expression for the spatial attention weights is as follows:

[0022]

[0023] Among them, f 7×7 () indicates a convolution operation with a kernel size of 7×7. This is the activation function.

[0024] Furthermore, for a given input χ∈R C×H×W This is then passed to the three branches of the spatial attention module;

[0025] The input to the first branch remains unchanged, and after passing through pooling, convolution, and activation functions, the output χ is obtained. Ⅰ ∈R C×H×W The input χ of the second branch is symmetrically symmetrical about the central axis to obtain a new input χ'. Ⅱ =R C×H×W Then, following the same operations as the first branch, the resulting feature map is symmetrically symmetrical along the central axis again to obtain the final output χ of the second branch. Ⅱ ∈R C×H×W ;

[0026] By symmetrically transposing the input χ of the third branch along the central axis, we obtain a new input χ'. Ⅲ =R C×H×W Then, after pooling, convolution, and activation functions, the resulting feature map is flipped up and down along the central axis again to obtain the final output χ of the third branch. Ⅲ ∈R C×H×W Finally, χ Ⅰ , χ Ⅱ , χ Ⅲ The final output of the multi-angle feature enhancement module is obtained by adding the results, and the process can be represented by the following formula:

[0027] χ0=W c (χ)*χ

[0028] χ 01 =W s (χ0)*χ0+W s (χ' Ⅱ )*χ' Ⅱ +Ws (χ' Ⅲ )*χ' Ⅲ

[0029] Where χ0 is the output of the channel attention module of the multi-angle feature enhancement module, χ 01 This is the final output of the multi-angle feature enhancement module.

[0030] Furthermore, it also includes using loss functions for network optimization, including classification loss, localization loss, and confidence loss.

[0031] Furthermore, binary cross-entropy loss is used as both classification loss and confidence loss, while the localization loss is a new loss function based on fully intersecting joint loss, the expression of which is:

[0032]

[0033] Among them, b,b gt ρ represents the center point of the predicted box and the ground truth box, respectively. 2 () represents the Euclidean distance between two points, C represents the diagonal distance of the smallest closed bounding region that simultaneously contains the predicted box and the ground truth box, and α is the weight function, the specific expression of which is as follows:

[0034]

[0035] Here, υ is responsible for calculating the relativity of the aspect ratio, and the specific expression is as follows:

[0036]

[0037] in, This represents the aspect ratio of the actual bounding box. υ represents the aspect ratio of the predicted bounding box. The smaller υ is, the closer the aspect ratios of the ground truth bounding box and the predicted bounding box are.

[0038] Furthermore, there are v1, v2, and v3 versions of the loss function, whose expressions are as follows:

[0039]

[0040] Where (x,y) are the coordinates of the center point of the prediction box, (x gt ,y gt W represents the coordinates of the center point of the true bounding box. g H g This represents the width and height of the smallest enclosing bounding box between the predicted and ground truth boxes. The superscript ★ indicates... Separate from the computation graph;

[0041]

[0042] in, It is the exponential running average of momentum m, and γ is a hyperparameter;

[0043]

[0044] Where β is the outlier value of the anchor box, and the hyperparameters α and δ are responsible for controlling the mapping relationship between the outlier value and the gradient gain.

[0045] Compared with the prior art, the beneficial effects of the present invention are as follows:

[0046] 1. The multi-defect category insulator defect detection method based on multi-angle feature enhancement proposed in this invention proposes a new attention mechanism that integrates channel attention and spatial attention, which enhances the extraction of feature information of the region of interest. This method improves the detection accuracy and detection speed of the model without increasing a large number of parameters. By introducing an attention mechanism into image processing, the effect of feature extraction can be improved.

[0047] 2. The multi-defect category insulator defect detection method based on multi-angle feature enhancement proposed in this invention proposes a new multi-scale bidirectional feature fusion structure, which can retain more target information in the high-level feature map during the fusion process, thereby achieving more accurate target detection.

[0048] 3. The multi-defect category insulator defect detection method based on multi-angle feature enhancement exemplified in this invention optimizes the localization loss function, enabling it to focus on anchor frames of ordinary quality and improve the overall performance of the detector. Attached Figure Description

[0049] Other features, objects, and advantages of this application will become more apparent from the following detailed description of non-limiting embodiments with reference to the accompanying drawings:

[0050] Figure 1 Network framework diagram;

[0051] Figure 2 Diagram of multi-scale bidirectional feature fusion structure and MAFE+X structure;

[0052] Figure 3 A diagram of the multi-angle feature enhancement module;

[0053] Figure 4 The images show the results of insulator defect detection under different complexity backgrounds. Detailed Implementation

[0054] The present application will now be described in further detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and not intended to limit it. Furthermore, it should be noted that, for ease of description, only the parts relevant to the invention are shown in the accompanying drawings.

[0055] It should be noted that, unless otherwise specified, the embodiments and features described in this application can be combined with each other. This application will now be described in detail with reference to the accompanying drawings and embodiments.

[0056] This embodiment provides a multi-defect category insulator defect detection method based on multi-angle feature enhancement. The insulator defect detection process in this embodiment utilizes a trained model to find defects present in the detected insulator image. The specific process is as follows:

[0057] (1) Input image dataset and corresponding label dataset: The given dataset contains original images and corresponding label files. First, four images are stitched together using data augmentation methods with random scaling, cropping, and arrangement to enable the model to identify targets within a smaller range. To enable the model to handle images of different sizes and channels, the stitched images are scaled to a uniform size of 640×640×3. Next, to predict targets of different scales and aspect ratios, a set of predefined anchor boxes is used. During training, the model adaptively adjusts the size of the anchor boxes according to the actual size and aspect ratio of the target. Through supervised learning task training and evaluation, we can use existing annotation information to train a neural network model that can accurately identify image defect categories.

[0058] (2) Feature Extraction Process: The core part of the feature extraction process is to extract image features. Its role is to transform the original input image into a multi-layer feature map for subsequent target detection tasks. The feature extraction process is mainly responsible for performing a series of convolution and pooling operations on the input image to extract feature representations with rich semantic information. For a 640×640×3 input image, after several convolutional layers, normalization layers, and activation functions, a 20×20×1024 feature map is obtained. We use the final output feature map to characterize information such as the location and category of insulator defects;

[0059] (3) Feature Fusion Process: Feature fusion mainly combines feature maps from different levels to generate feature maps with multi-scale information, thereby improving the accuracy of target detection. The feature fusion process is divided into two parts: top-down and bottom-up. Upsampling and downsampling operations are used to fuse feature maps from different levels to generate a multi-scale feature pyramid. The top-down part mainly achieves the fusion of features from different levels through upsampling and fusion with coarser feature maps, while the bottom-up part uses a convolutional layer to fuse feature maps from different levels. Finally, the feature maps from the top-down and bottom-up parts are fused to obtain the final feature map, which is used for insulator defect detection.

[0060] (4) Output Results: The trained neural network is used to process the input image containing insulator defects. The image to be detected is divided into H×W networks. If the center of a target falls within a network, that network is responsible for predicting that target. The class information predicted by each network is multiplied by the confidence score of the target box prediction to obtain the class confidence score of each target box. Then, target boxes with low class confidence scores are filtered out, and non-maximum suppression is applied to the remaining target boxes to obtain the final detection results.

[0061] Network framework:

[0062] This embodiment designs a novel insulator defect detection framework, such as... Figure 1 As shown. For the input image data: (1) Image features are extracted using the designed network architecture; (2) Feature fusion is performed on the multi-layer feature maps; (3) Network optimization is performed using a loss function; (4) Detection results are output. The network of this invention further enhances the spatial features while considering channel features. At the same time, we construct a new multi-scale bidirectional feature fusion structure, which retains more contextual information during the fusion process and speeds up the fusion process.

[0063] We divide the model into four modules: the input end, the backbone feature extraction network, the feature fusion network, and the output end.

[0064] The input includes Mosaic data augmentation, adaptive anchor box calculation, and adaptive image scaling, ultimately converting the image into a 640×640×3 tensor which is then input into the network.

[0065] The backbone feature extraction network is mainly used to extract features from the input image. First, it goes through the convolution operation of the CBS layer. The CBS structure includes convolutional layers, normalization layers, and activation function layers. Then, it extracts features through downsampling, then through the CSP structure, and finally through the Spatial Pyramid Pooling (SPPF) structure to obtain the feature map.

[0066] The feature fusion network is located between the backbone feature extraction network and the output. It effectively utilizes the features extracted by the feature extraction network, leveraging a multi-scale bidirectional feature fusion structure (such as...). Figure 2 (As shown) This enables the upsampling and downsampling processes, thereby improving the diversity and robustness of features.

[0067] In the output layer, the features extracted earlier are mainly used for prediction to complete the output of the target detection results.

[0068] Specifically, the CSP structure splits the feature maps of the input layer. One part is stacked after feature extraction, while the other part has a residual edge that crosses the stage hierarchy and is then connected and merged with the previous feature extraction results. This reduces the repetition of gradient information during training and improves the network's running speed. The main purpose of the spatial pyramid pooling structure is to solve the image distortion problem caused by image region cropping and scaling operations, and to address the issue of repetitive image feature extraction in convolutional neural networks. This operation not only improves computational efficiency but also saves network computation time. The spatial pyramid pooling structure uses three different scales of max pooling and an identity mapping path for feature extraction, increasing the receptive field of the network model during feature extraction and achieving multi-scale feature fusion. This is beneficial for extracting more important feature information. Then, the feature maps after pooling at different scales are aggregated through residual connections. This method effectively reduces the possibility of information loss caused by direct image cropping.

[0069] The multi-scale bidirectional feature fusion structure enables deep feature fusion, improving feature extraction capabilities. First, the last feature map of the input image (20×20×512) is reduced in channel dimension by a 1×1 convolution, then upsampled and stacked with a 40×40×256 feature map. The residual concatenation result is then convolved through a CSP layer to retain effective information. Next, the 40×40×256 feature map is convolved with an 80×80×128 feature map by a 1×1 convolution and upsampling. The residual concatenation result is then passed through a CSP convolution to obtain the first output feature fusion layer. This first output feature fusion layer is downsampled and concatenated with previous features of the same size, followed by another CSP convolution to output a second feature fusion layer. The second output layer is processed in the same way to obtain a third output layer. Through continuous fusion, concatenation, and stacking of features at different scales, the detection capability of targets at different scales is improved.

[0070] When using the trained model to detect images, three feature layers of different sizes are extracted: (80×80×128), (40×40×256), and (20×20×512). Three detection heads are set up to detect features at different scales, and predictions are made based on the obtained features. The detection image with a feature map size of 80×80×128 has the smallest receptive field and is generally used to detect small targets; the 40×40×256 feature map has a medium-sized receptive field and is suitable for detecting medium-sized features; finally, the 20×20×512 feature map has a relatively large receptive field and is mainly used to detect larger objects in the image.

[0071] To address insulator defect detection in complex environments, this invention proposes a novel multi-angle feature enhancement module. This module combines channel and spatial attention mechanisms, enhancing features from different angles to improve the extraction rate of effective information. The feature fusion network of this invention is a novel deep learning network that effectively integrates feature information at different scales, accelerating feature fusion and reducing model parameters. Furthermore, this invention employs a novel loss function to mitigate the adverse effects of low-quality samples, further improving the accuracy of insulator defect detection.

[0072] Model training and performance evaluation:

[0073] Training a model requires a dataset and a training algorithm, while evaluating the performance of a model requires multiple evaluation metrics. The following is an introduction to the dataset, training algorithm, and evaluation metrics in this invention.

[0074] Dataset: The dataset in this invention includes labels, the number of categories, a training set, a test set, and a validation set. The original dataset contains 669 images, which are divided into five categories: missing material, detachment, partial detachment, partial detachment 1, and insulator. The first four categories represent insulator defects, and the insulator category represents the insulator string. In the original dataset, there are 431 images in the partial detachment category, 130 images in the partial detachment 1 category, 46 images in the detachment category, and 62 images in the missing material category. Each image contains images of the insulator category. The missing material category generally indicates that a portion of the insulator's edge is missing. The detachment category refers to the entire insulator completely detaching from its fixed position. The partial detachment category is generally located at the top of the insulator string and is larger in shape than the partial detachment 1 category. Due to the difficulty of collecting insulator data, this invention currently considers only these five categories. In future practical applications, more insulator defect categories will be added; as long as there is sufficient dataset support, this invention will remain applicable after training.

[0075] Before training, the dataset was divided into training, validation, and test sets. To balance the categories, we expanded the images for different categories by random scaling, modifying brightness, and adjusting contrast. The expanded dataset contained 3622 images. We used the open-source annotation software LabelImg to annotate defects in the insulator images, using the mainstream PASCAL VOC2007 dataset format. During the experiment, to meet the data format requirements, we further converted the PASCAL VOC2007 dataset format to a txt file containing image paths and defect location information for training.

[0076] Training Algorithm: Hyperparameters are parameters that need to be pre-set during model training and inference, and they affect the model's performance, speed, and stability. By adjusting these parameters, the training and inference performance of the model can be optimized for different applications. In this invention, the weight decay is set to 0.0005, and the initial learning rate is set to 0.001. The entire training process consists of 300 rounds, with a batch size of 16 and an input image size of 640×640.

[0077] Evaluation Metrics: To fully verify the performance of this invention, we introduced seven evaluation metrics: Precision (P), Recall (R), F1 Score (F1), Average Precision (mAP), Parameters, FLOPs, and Frames Per Second (FPS), which are described in detail below:

[0078]

[0079]

[0080]

[0081]

[0082] Insulator images are categorized into two types: normal images and images containing defects. We can consider them as positive samples and negative samples, respectively. True positives (TP) represent the number of samples correctly classified as positive, false positives (FP) represent the number of samples incorrectly classified as positive, and false negatives (FN) represent the number of samples incorrectly classified as negative.

[0083] Accuracy: Represents how many of the results detected as positive by the model are actually positive samples. Higher accuracy means that the model is less likely to mistakenly identify negative samples as positive samples.

[0084] Recall: Represents the proportion of positive samples correctly detected by the model out of all real positive samples. The higher the recall, the better the model can capture the real target.

[0085] F1 score: It is the harmonic mean of precision and recall, which can comprehensively consider the precision and recall of the model.

[0086] Average precision: This combines the precision at different confidence thresholds to obtain a comprehensive performance evaluation. Average precision is calculated based on the precision-recall curve, obtained by averaging the precision-recall curves for all classes.

[0087] Number of parameters: This refers to the number of trainable parameters in the model, including weights and biases. More parameters mean a more complex model with stronger representational power, but also potentially increased risk of overfitting.

[0088] Multi-scale bidirectional feature fusion structure:

[0089] The diversity of insulator defect categories and the complexity of image backgrounds increase the difficulty of detection. Sufficient semantic and location information is needed to achieve more accurate detection. In the backbone feature extraction network of this invention, a 640×640 insulator image is downsampled to obtain feature maps of different sizes. Next, feature maps of sizes 80×80, 40×40, and 20×20 are input into a multi-scale bidirectional feature fusion structure, such as... Figure 3 As shown. However, after multiple upsampling operations, the positional information of the feature map is gradually lost, leading to a decrease in the network's detection accuracy for insulator defects. Therefore, we constructed a novel MAFE+X structure, as shown. Figure 2 As shown, MAFE+X consists of multiple CBS modules and multi-angle feature enhancement modules, which can further enhance the key information in the feature maps. In the multi-scale bidirectional feature fusion structure, in order to achieve excellent results without adding too many parameters, we set X to 1 to enhance the positional information from the backbone and upper-layer feature maps. In addition, the scale-based bidirectional feature fusion structure achieves more efficient and faster feature fusion through a pyramid structure from top to bottom and bottom to top.

[0090] Multi-angle feature enhancement module:

[0091] like Figure 3 As shown, the channel attention module and the triple spatial attention module constitute the multi-angle feature enhancement module. Spatial attention complements channel attention, helping the network focus its attention on locations of useful information in the image. To better focus on objects, the spatial attention module uses a large 7×7 receptive field convolutional kernel to aggregate spatial features.

[0092] We use χ∈R C×H×W The input feature map is represented by W, and the channel attention weights are expressed in terms of W. c ∈R C×1×1 In other words, spatial attention weights are represented by W. s∈R 1×H×W The channel attention approach has two branches. The input feature map χ represents the average pooling and max pooling operations performed along the channel dimension, resulting in two one-dimensional feature maps L1, L2 ∈ R. C×1×1 These feature maps then pass through a multilayer perceptron, and finally are normalized using an activation function to limit the channel attention weights to the range of 0 to 1. The specific mathematical expression for the channel attention weights is as follows:

[0093] W c (x)=σ(f(ζ(χ))+f(ξ(χ)))

[0094] Where ζ() and ξ() represent average pooling and max pooling operations on the input, respectively, f() represents the multilayer perceptron operation, and σ() represents the activation function.

[0095] The same principle applies to spatial attention; average pooling and max pooling operations are performed in the spatial dimension to obtain two two-dimensional feature maps J1, J2 ∈ R. 1×H×W After concatenation, the data is processed through a 7×7 convolution kernel and an activation function. The specific mathematical expression for the spatial attention weights is as follows:

[0096]

[0097] Among them, f 7×7 () indicates a convolution operation with a kernel size of 7×7. This is the activation function.

[0098] In the multi-angle feature fusion module, the first part, the channel attention module, informs the network "which channel is more important," while the second part, the triple spatial attention module, consists of three parallel branches. One branch remains unchanged, while the other two branches further expand the spatial receptive field and enhance the extraction of useful information features. The multi-angle feature enhancement module expands the spatial receptive field and enhances spatial context information through these three spatial attention branches, helping the network focus on useful information in space three times from different perspectives. It further strengthens the extraction of important information, enabling the network to more accurately locate the target of interest.

[0099] For a given input χ∈R C×H×W We pass this input to the three branches of the spatial attention module. The input to the first branch remains unchanged, and after pooling, convolution, and activation functions, we obtain the output χ. Ⅰ∈R C×H×W To expand the spatial receptive field of the input along the X-axis, we symmetrically represent the input χ of the second branch along the central axis to obtain a new input χ'. Ⅱ =R C×H×W Then, following the same operations as the first branch, the resulting feature map is symmetrically symmetrical along the central axis again to obtain the final output χ of the second branch. Ⅱ ∈R C×H×W Similarly, to expand the spatial receptive field of the input in the Y-axis direction, we symmetrically represent the input χ of the third branch along the central axis to obtain a new input χ'. Ⅲ =R C ×H×W Next, after pooling, convolution, and activation functions, the resulting feature map is flipped vertically along the central axis again to obtain the final output χ of the third branch. Ⅲ ∈R C×H×W Finally, χ Ⅰ , χ Ⅱ , χ Ⅲ The final output of the multi-angle feature enhancement module is obtained by adding the results, and the process can be represented by the following formula:

[0100] χ0=W c (χ)*χ

[0101] x 01 =W s (χ0)*χ0+W s (χ' Ⅱ )*χ' Ⅱ +W s (χ' Ⅲ )*χ' Ⅲ

[0102] Where χ0 is the output of the channel attention module of the multi-angle feature enhancement module, χ 01 This is the final output of the multi-angle feature enhancement module.

[0103] Loss function:

[0104] The loss function of this invention is a multi-task loss function, including classification loss, localization loss, and confidence loss. The classification loss is responsible for calculating whether the anchor box and its corresponding label are correctly classified; the confidence loss is responsible for calculating the network's confidence; and the localization loss is responsible for calculating the error between the predicted box and the ground truth box. Specifically, we use binary cross-entropy loss as both the classification loss and the confidence loss, while the localization loss is a new loss function based on fully intersecting joint loss. The fully intersecting joint loss is first described by the following expression:

[0105]

[0106] Among them, b,b gt ρ represents the center point of the predicted box and the ground truth box, respectively. 2 () represents the Euclidean distance between two points, C represents the diagonal distance of the smallest closed bounding region that simultaneously contains the predicted box and the ground truth box, and α is the weight function, the specific expression of which is as follows:

[0107]

[0108] Here, υ is responsible for calculating the relativity of the aspect ratio, and the specific expression is as follows:

[0109]

[0110] in, This represents the aspect ratio of the actual bounding box. υ represents the aspect ratio of the predicted bounding box; the smaller υ is, the closer the aspect ratios of the ground truth bounding box and the predicted bounding box are. The loss function of this invention has versions v1, v2, and v3, and their expressions are as follows:

[0111]

[0112] Where (x,y) are the coordinates of the center point of the prediction box, (x gt ,y gt W represents the coordinates of the center point of the true bounding box. g H g This represents the width and height of the smallest enclosing bounding box between the predicted and ground truth boxes. The superscript ★ indicates... Separate from the computation graph.

[0113]

[0114] in, It is the exponential running average of momentum m, and γ is a hyperparameter.

[0115]

[0116] Where β is the outlier value of the anchor box, and the hyperparameters α and χ are responsible for controlling the mapping relationship between the outlier value and the gradient gain.

[0117] This invention proposes an insulator defect detection model based on multi-angle feature enhancement. The model's feature fusion network combines an attention mechanism and a feature pyramid structure, avoiding information loss during feature fusion while retaining more detailed information, thus improving the accuracy of insulator defect detection. To address the interference caused by low-quality samples, the localization loss function is optimized, further enhancing the accuracy of insulator defect detection.

[0118] To verify the effectiveness of this invention, we selected images with varying background complexity for prediction, and the detection results are as follows: Figure 4 As shown in the diagram, the first column represents the original image, the second to fourth columns are the detection results of other algorithms, and the last column is the detection result of this invention. Analysis of the detection results demonstrates that the insulator defect detection model based on multi-angle feature enhancement can accurately identify insulator defects under various challenging backgrounds. Furthermore, comparisons using different evaluation metrics demonstrate that the proposed model possesses high accuracy and efficiency in insulator defect detection.

[0119] Those skilled in the art should understand that the scope of the invention involved in this application is not limited to technical solutions formed by specific combinations of the above-mentioned technical features, but should also cover other technical solutions formed by arbitrary combinations of the above-mentioned technical features or their equivalent features without departing from the inventive concept. For example, technical solutions formed by substituting the above-mentioned features with technical features disclosed in this application (but not limited to) that have similar functions.

Claims

1. A multi-defect category insulator defect detection method based on multi-angle feature enhancement, characterized in that, Includes the following steps: (1) Input image dataset and corresponding label dataset: First, the images are stitched together using data augmentation methods so that the model can identify targets in a smaller range; then, a set of pre-defined anchor boxes are used. During the training process, the model adaptively adjusts the size of the anchor boxes. Through supervised learning task training and evaluation, a neural network model that can accurately identify image defect categories is trained. (2) Feature extraction: First, the feature map is obtained by the convolution operation of the CBS layer. The CBS structure includes a convolutional layer, a normalization layer and an activation function layer. Then, the feature map is extracted by downsampling, then by the CSP structure, and finally by the spatial pyramid pooling structure. (3) Feature fusion: By upsampling and downsampling operations, feature maps at different levels are fused together, and the diversity and robustness of features are improved by using a multi-scale bidirectional feature fusion structure; (4) Output results: The trained neural network is used to process the input image containing insulator defects. The image to be detected is divided into H×W networks. If the center of a target falls in a network, then the network is responsible for predicting the target. The category information predicted by each network is multiplied by the confidence score of the target box prediction to obtain the category confidence score of each target box. Then, the target boxes with low category confidence scores are filtered out, and the remaining target boxes are subjected to non-maximum suppression processing to obtain the final detection results. In step (3), a new MAFE+X structure is constructed. MAFE+X consists of multiple CBS modules and a multi-angle feature enhancement module, which can further enhance the key information in the feature map. The multi-angle feature enhancement module includes a channel attention module and a triple spatial attention module. use The input feature map is represented by channel attention weights. This indicates that channel attention has two branches, with the input feature map... These represent average pooling and max pooling operations performed along the channel dimension, respectively, resulting in two one-dimensional feature maps. These feature maps then pass through a multilayer perceptron, and are finally normalized using an activation function. The mathematical expression for channel attention weights is as follows: Where ζ() and ξ() represent the average pooling and max pooling operations on the input, respectively. f () represents the multilayer perceptron operation, and σ() represents the activation function; Spatial attention weights This means that by performing average pooling and max pooling operations in the spatial dimension, two two-dimensional feature maps are obtained. After concatenation, the spatial attention weights are obtained through a 7×7 convolutional kernel and an activation function. The mathematical expression for the spatial attention weights is as follows: wherein, denotes a convolution operation with a kernel size of 7x7, is an activation function; For a given input This is then passed to the three branches of the spatial attention module; The input to the first branch remains unchanged, and the output is obtained after passing through pooling, convolution, and activation functions. , take the input of the second branch By performing left-right symmetry along the central axis, a new input is obtained. Then, following the same operations as the first branch, the resulting feature map is symmetrically symmetrical along the central axis again to obtain the final output of the second branch. ; Input to the third branch By performing vertical symmetry along the central axis, a new input is obtained. Then, after pooling, convolution, and activation functions, the resulting feature map is flipped up and down along the central axis again to obtain the final output of the third branch. Finally, , , The final output of the multi-angle feature enhancement module is obtained by adding the results, and the process can be represented by the following formula: wherein, is the output of the channel attention module of the multi-angle feature enhancement module, is the final output of the multi-angle feature enhancement module.

2. The multi-angle feature enhancement based multi-defect class insulator defect detection method according to claim 1, characterized in that, In step (2), the last layer feature map of the input image, 20×20×512, is first reduced in channel dimension by 1×1 convolution and then stacked with the upsampled 40×40×256 feature map. The result of the residual connection is convolved through the CSP layer to retain the effective information. The 40×40×256 feature map is then convolved with a 1×1 convolution and upsampled, and then concatenated with the 80×80×128 feature map. The result after residual connection is passed through the first output feature fusion layer obtained by CSP convolution. The first output feature fusion layer is downsampled and concatenated with the previous features of the same size. Then, CSP convolution is performed again to output the second feature fusion layer. The second output layer is then processed in the same way to obtain the third output layer.

3. The insulator defect detection method based on multi-angle feature enhancement according to claim 1, characterized in that, It also includes using loss functions for network optimization, including classification loss, localization loss, and confidence loss.

4. The insulator defect detection method based on multi-angle feature enhancement according to claim 3, characterized in that, Binary cross-entropy loss is used as both classification and confidence loss, while the localization loss is a new loss function based on fully intersecting joint loss, the expression of which is: Among them, b,b gt ρ represents the center point of the predicted box and the ground truth box, respectively. 2 () represents the Euclidean distance between two points, C represents the diagonal distance of the smallest closed bounding region that simultaneously contains the predicted box and the ground truth box, and α is the weight function, the specific expression of which is as follows: Here, υ is responsible for calculating the relativity of the aspect ratio, and the specific expression is as follows: in, This represents the aspect ratio of the actual bounding box. υ represents the aspect ratio of the predicted bounding box. The smaller υ is, the closer the aspect ratios of the ground truth bounding box and the predicted bounding box are.

5. The insulator defect detection method based on multi-angle feature enhancement according to claim 3, characterized in that, The loss function has versions v1, v2, and v3, and its expression is as follows: in, The coordinates of the center point of the prediction box. The coordinates of the center point of the true bounding box. The superscript indicates the width and height of the smallest closed bounding box between the predicted and ground truth boxes. express Separate from the computation graph; γ>0 wherein, is an exponentially running average of the momentum m, and γ is a hyperparameter; Where β is the outlier value of the anchor box, and the hyperparameters α and δ are responsible for controlling the mapping relationship between the outlier value and the gradient gain.