A lightweight transmission line micro-defect detection method fusing FasterNet and RepViT

By integrating FasterNet and RepViT into a lightweight detection method, the problem of the imbalance between accuracy and computing power of the YOLOv8 algorithm on edge devices is solved, realizing high-precision and fast detection of minute defects in power transmission lines, and adapting to the computing power constraints of UAV edge computing platforms.

CN122244727APending Publication Date: 2026-06-19CHINA THREE GORGES UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
CHINA THREE GORGES UNIV
Filing Date
2026-03-10
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

The existing YOLOv8 algorithm suffers from an imbalance between accuracy and computing power on edge devices, making it difficult to meet the high accuracy and speed requirements for detecting minute defects in power transmission lines after being lightweighted. Furthermore, traditional lightweight algorithms have failed to achieve coordinated optimization between accurate extraction of local features and effective modeling of global semantics.

Method used

We employ a lightweight detection method combining FasterNet and RepViT. We reconstruct the backbone network through PartialConvolution, introduce an attention mechanism and RepViT module, and combine it with CoordinateAttention to construct a dual-path feature fusion logic. This enables efficient fusion of local and global features, and the model is deployed on an edge computing platform.

Benefits of technology

It achieves high-precision, millisecond-level detection of minute defects in power transmission lines on an edge computing platform, improving detection robustness and spatial positioning accuracy of small targets, and adapting to the computing power constraints of UAV edge computing platforms.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122244727A_ABST
    Figure CN122244727A_ABST
Patent Text Reader

Abstract

A lightweight method for detecting minute defects in power transmission lines, integrating FasterNet and RepViT, is proposed. The detection network is reconstructed based on the YOLOv8 framework, replacing the C2f module of the backbone network with FasterNetBlock integrating PartialConvolution to achieve lightweight local feature extraction. A RepViT global perception module is embedded in the neck FPN to construct a dual-path architecture of local feature extraction and global semantic modeling, achieving bidirectional fusion of local and global features. A CoordinateAttention mechanism is introduced at the feature fusion node to enhance the spatial localization accuracy of minute defects such as missing pins and micro-cracks. The method also completes the entire process design for defect dataset construction and preprocessing, improved model training, and INT8 lightweight quantization edge device deployment. It can be directly deployed on the JetsonOrin edge computing platform, adapting to UAV high-altitude field inspection scenarios, and exhibits excellent generalization ability and real-time detection performance.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of computer vision and power equipment operation and maintenance technology, specifically involving a lightweight method for detecting minute defects in transmission lines that integrates FasterNet and RepViT. Background Technology

[0002] Transmission lines are the core infrastructure of the power system, and their operating status directly determines the stability and security of power supply. Minor defects such as missing pins and micro-cracks, if not detected in time, can easily expand under long-term conditions of wind, rain, and vibration, leading to line faults or even large-scale power outages. Currently, the inspection mode based on drones equipped with edge devices has become the mainstream solution for power transmission line defect detection due to its high flexibility and wide coverage. The YOLOv8 algorithm is widely used in this scenario due to its comprehensive advantages in detection speed and accuracy.

[0003] However, the existing YOLOv8 algorithm has a significant imbalance between accuracy and computing power on edge devices: if high detection accuracy is maintained, the C2f module of its backbone network has channel redundancy, resulting in an excessive number of model parameters and floating-point operations, and the inference speed cannot meet the real-time inspection requirements; if the network structure is simplified to achieve lightweighting, the feature extraction capability will be sacrificed, especially the detection accuracy of small defects in complex backgrounds will drop significantly.

[0004] Traditional lightweight algorithms often employ methods such as reducing the number of network layers or channels, essentially sacrificing accuracy for speed. This makes them ill-suited for the dual demands of high precision and speed in detecting minute defects in transmission lines. While some existing technologies have improved YOLO-based algorithms by replacing the backbone network or introducing attention mechanisms, they have failed to achieve coordinated optimization of accurate local feature extraction and effective global semantic modeling. Furthermore, they lack adaptation mechanisms for the spatial localization characteristics of minute defects in transmission lines, resulting in limited detection performance in complex scenarios.

[0005] Therefore, there is an urgent need for a lightweight method for detecting minute defects in power transmission lines that can overcome the bottlenecks in accuracy and computing power and is compatible with edge computing platforms. Summary of the Invention

[0006] To address the aforementioned issues, this invention provides a lightweight method for detecting minute defects in power transmission lines that integrates FasterNet and RepViT. This method achieves high-precision, millisecond-level detection of minute defects in power transmission lines under complex backgrounds and is adapted to the computing power constraints of UAV edge computing platforms.

[0007] To solve the above-mentioned technical problems, the technical solution adopted by the present invention is as follows: A lightweight method for detecting minute defects in transmission lines, integrating FasterNet and RepViT, includes the following steps: S1: Collect images of power transmission lines under different lighting conditions, backgrounds, and defect types obtained from UAV inspections, and manually annotate the defect areas of the power transmission line image samples to generate annotation files; S2: Perform data augmentation on the dataset and divide the processed dataset into training set, validation set and test set in a 7:2:1 ratio; S3: Reconstruct the YOLOv8 network architecture, make lightweight improvements to the backbone and neck networks, introduce an attention mechanism, and form an improved detection network. The overall YOLOv8 improved detection network architecture consists of three parts: feature extraction layer, feature fusion layer, and defect detection layer. S4: Input the preprocessed dataset into the improved detection network described above, and train the model based on the PyTorch framework; S5: Lightweight and quantize the trained optimal model and deploy it to the JetsonOrin edge computing platform carried by the drone.

[0008] In S1, the defect area is manually annotated using the LabelImg tool to generate an XML format annotation file.

[0009] In S2, data augmentation processes include random cropping, brightness and contrast adjustment, horizontal flipping, and Gaussian noise addition, which expand the dataset size and improve the model's generalization ability.

[0010] In S3, lightweight reconstruction of the backbone network specifically includes: The original YOLOv8 C2f module was removed and replaced with FasterNetBlock, which integrates partial convolution, as the core feature extraction unit. FasterNetBlock performs channel-group convolution on the input feature map through PartialConvolution, performing spatial convolution operations on only some channels and directly retaining the original features of the remaining channels. While significantly compressing channel redundancy and reducing the amount of computation and parameters, it accurately captures the local texture features of minor defects in transmission lines, achieving a balance between lightweight design and local feature extraction capabilities. The core computational process of PartialConvolution is defined as follows: (1) In formula (1): For the input feature map, H, W, and C represent the height, width, and number of channels of the feature map, respectively; For the channel group mask, M c =1 indicates that the c-th channel participates in the convolution operation, M c=0 indicates that the original features of the c-th channel are directly preserved, and the number of groups is set to 4; Here, k represents the kernel size of PartialConvolution, and C represents the kernel size. g Indicates the number of channels in each group; Here, σ is the bias term, and σ is the SiLU activation function. This is the output feature map of PartialConvolution; It is a 1 / 4 channel feature transformation branch, which completes the extraction of local texture features of minute defects; X⊙(1 M) represents the residual branch with no computational direct transmission. It is a 3 / 4 channel original feature preservation branch with no convolution / activation operations. It directly transmits the original information of the input features to the fusion layer and is the core of the residual connection. If Mc=1, meaning the c-th channel participates in the convolution operation: Xc⊙Mc=Xc、Xc⊙(1 Mc)=0; The channel feature will enter the convolution operation process of Wpc to complete the extraction of local texture features; If Mc=0, that is, the c-th channel directly retains the original features: Xc⊙Mc=0、Xc⊙(1 Mc)=Xc; In the formula: Xc represents the feature map of the c-th channel in the input feature map X.

[0011] This channel skips convolution operations and directly uses the original features in subsequent element-by-element addition, achieving direct transmission without computation.

[0012] In S3, the global feature fusion of the neck is optimized by embedding the RepViT global perception module in the neck feature pyramid network FPN; The RepViT module is based on a reparameterized convolutional structure. It efficiently models global semantic information by combining deep downsampling with point-to-point convolution, thus overcoming the limitations of FasterNetBlock in local feature extraction. The RepViT module enables bidirectional fusion of local features extracted from the backbone network and global semantic features of the neck, constructing a dual-path feature fusion logic to solve the problem of distinguishing between minute defects and background in complex contexts. The formula for dual-path feature fusion is: (2); In formula (2): This represents a local feature map output by the backbone network. This represents the global semantic feature map output by the RepViT module; The feature fusion weight coefficients are used to balance the contributions of local texture and global semantics through adaptive learning during training. This represents the final fused feature map.

[0013] The global semantic modeling process of the RepViT module includes two steps: downsampling and feature recovery. The core calculation is as follows: (3); (4); In equations (3) and (4): Conv2d represents a downsampling convolution with a stride of 2, and ConvTranspose2d represents a transposed convolution with a stride of 2, ensuring that the input and output feature map sizes are consistent.

[0014] In S3, a coordinate attention mechanism, CoordinateAttention, is introduced at the feature fusion node between the backbone network and the neck. The coordinate attention mechanism generates an attention map that combines channel weights and spatial location information by performing global average pooling in the channel dimension and feature encoding in the coordinate dimension on the feature map. This accurately enhances the feature response of small defect areas, suppresses background interference, and improves the spatial positioning accuracy of small targets such as missing pins and microcracks. The calculation process for CoordinateAttention is as follows: Global average pooling along the channel dimension: (5); In formula (5): This represents the global average pooling result for channel c. This represents the pixel value of the fused feature map at position (i,j) and in the c-th channel; Coordinate dimension feature encoding: (6); (7); In equations (6) and (7): These represent the coordinate features in the vertical and horizontal directions, respectively. `Concat` indicates a feature concatenation operation. .

[0015] CoordinateAttention also includes attention weight generation and feature weighting, calculated using the following formula: (8); In equation (8): Here, σ represents the weights of the fully connected layer, r is the compression ratio, δ is the ReLU activation function, σ is the Sigmoid activation function, and b1 and b2 are bias terms. This is the feature map enhanced by the attention mechanism.

[0016] In S4, training parameters include batch size and initial learning rate. The learning rate is adjusted using the cosine annealing algorithm, and the CIoU loss function is used to optimize the bounding box regression accuracy. The learning rate of the cosine annealing algorithm changes as follows: in the early stages of training, when t is small, cos(π·t / T) approaches 1, and η t As the initial learning rate η0 approaches, a large learning rate allows the model to update its parameters quickly, approximating the optimal solution; in the later stages of training, when t approaches T, cos(π·t / T) approaches -1, and η t Smoothly decrease to the minimum learning rate η min A small learning rate allows the model to finely adjust its parameters near the optimal solution, avoiding parameter oscillations caused by an excessively large learning rate, and improving the model's convergence stability and detection accuracy. During training, the network parameters are adjusted in real time using the validation set until the model converges, and the optimal model is saved.

[0017] Finally, the best trained model is quantized using INT8 and deployed to the JetsonOrin edge computing platform on the drone. During the drone inspection, images of the power transmission line are collected in real time and transmitted to the edge platform. The improved model is used for inference detection and outputs the defect type, defect location coordinates and confidence level. The specific expression for INT8 quantization is as follows: (9); In equation (9): η t ηt represents the current learning rate in the t-th iteration, which is the actual learning rate used when the model is trained in the t-th iteration and changes dynamically with the number of iterations; η0 represents the initial learning rate, which is the starting learning rate for model training and is manually configured; ηt min η represents the minimum learning rate, which is the lower limit for the learning rate to decrease, preventing the model training from stalling due to an excessively low learning rate; t represents the current training iteration number, with values ​​of 1, 2, 3, ..., T, where each complete training of the full dataset is counted as one iteration; T represents the total number of training iterations, which is a manually preset total number of training iterations; cos(·) represents the cosine trigonometric function, which implements the learning rate from η0 to η min It exhibits a smooth cosine-like descent, rather than a nonlinear abrupt drop.

[0018] The main beneficial effects of this invention are as follows: 1) A high-efficiency balance between lightweight design and local feature extraction capabilities, significantly reducing computational power consumption: This invention replaces the C2f module of the YOLOv8 backbone network with FasterNetBlock, which integrates PartialConvolution. By using channel-group convolution, spatial convolution is performed only on 1 / 4 of the channels, while the remaining channels retain their original features. This fundamentally reduces network channel redundancy and significantly decreases the number of model parameters and floating-point operations. While achieving lightweight design, FasterNetBlock can accurately capture the local texture features of minute defects such as missing pins and microcracks, avoiding the loss of feature extraction capabilities caused by the "network pruning and channel compression" of traditional lightweight algorithms. This allows the improved model to have efficient local feature extraction capabilities on edge computing platforms, laying a precise feature foundation for subsequent defect detection.

[0019] 2) Two-way fusion of local and global features enhances detection robustness in complex backgrounds: By embedding the RepViT global perception module into the neck FPN, a dual-path architecture of local feature extraction and global semantic modeling is constructed. Adaptively learned weight coefficients enable accurate fusion of local features from the backbone network and global semantic features from the neck, effectively overcoming the limitations of single local feature extraction in complex scenarios. This design can quickly distinguish between minor defects in transmission lines and complex backgrounds, solving the problems of traditional algorithms being easily affected by background interference and having high false positive and false negative rates in complex field environments. It significantly improves the model's adaptability and robustness to different inspection scenarios.

[0020] 3) Enhance the spatial positioning accuracy of minute defects to achieve precise detection of small targets: A CoordinateAttention mechanism is introduced at the feature fusion node. Through global average pooling in the channel dimension and feature encoding in the coordinate dimension, an attention map that combines channel weights and spatial location information is generated. This can accurately enhance the feature response of small defect areas while effectively suppressing invalid feature interference from background areas. Targeted optimizations are made for the spatial localization characteristics of small defects such as missing pins and microcracks in transmission lines, solving the technical pain points of traditional algorithms in fuzzy localization and low detection accuracy of small target defects, and significantly improving the detection accuracy and localization accuracy of small defects.

[0021] 4) Model training optimization to ensure detection accuracy and convergence efficiency: A targeted model training strategy was designed based on the PyTorch framework. Cosine annealing was used to dynamically adjust the learning rate, effectively avoiding learning rate stagnation and overfitting during training, thus accelerating model convergence. The CIoU loss function was selected to optimize bounding box regression accuracy, making the predicted defect bounding boxes more closely match the actual defect areas, further improving detection accuracy. During training, network parameters were adjusted in real-time using a validation set to ensure that the final optimal model possesses excellent detection performance and generalization ability.

[0022] 5) Lightweight deployment enables real-time edge detection: The optimal trained model undergoes INT8 lightweight quantization, significantly reducing storage footprint and inference latency. It can be directly deployed on the JetsonOrin edge computing platform mounted on drones, enabling efficient migration of the detection model from the server to the edge. During high-altitude drone inspections in the field, images of power transmission lines can be acquired in real time, and inference detection can be completed on the edge platform. Defect type, location coordinates, and confidence level are quickly output, with a detection response speed meeting millisecond-level real-time detection requirements. This solves the problem that traditional models cannot run in real-time on edge devices due to high computing power requirements, making it suitable for mobile and real-time drone inspection scenarios. Attached Figure Description

[0023] The present invention will be further described below with reference to the accompanying drawings and embodiments: Figure 1 This is an overall flowchart of the method of the present invention; Figure 2 The core framework diagram of the model of this invention is reconstructed; Figure 3 This is a schematic diagram of an improved network architecture; Figure 4 This is a schematic diagram of the FasterNetBlock structure; Figure 5 A bar chart comparing the experimental results. Detailed Implementation

[0024] Example 1: As Figure 1 As shown, a lightweight method for detecting minute defects in transmission lines that integrates FasterNet and RepViT includes the following steps: S1: Collect images of power transmission lines under different lighting conditions, backgrounds, and defect types obtained from UAV inspections, and manually annotate the defect areas of the power transmission line image samples to generate annotation files; S2: Perform data augmentation on the dataset and divide the processed dataset into training set, validation set and test set in a 7:2:1 ratio; S3: Reconstruct the YOLOv8 network architecture, making lightweight improvements to the backbone and neck networks, introducing an attention mechanism, and forming an improved detection network. The overall YOLOv8 improved detection network architecture consists of three parts: a feature extraction layer, a feature fusion layer, and a defect detection layer. Figure 2 , Figure 3 As shown; S4: Input the preprocessed dataset into the improved detection network described above, and train the model based on the PyTorch framework; S5: Lightweight and quantize the trained optimal model and deploy it to the JetsonOrin edge computing platform carried by the drone.

[0025] In S1, the defect area is manually annotated using the LabelImg tool to generate an XML format annotation file.

[0026] In S2, data augmentation processes include random cropping, brightness and contrast adjustment, horizontal flipping, and Gaussian noise addition, which expand the dataset size and improve the model's generalization ability.

[0027] The lightweight reconstruction of the backbone network specifically includes: Remove the original YOLOv8 C2f module and replace it with, for example... Figure 4 The FasterNetBlock, which integrates partial convolution, is used as the core feature extraction unit. FasterNetBlock performs channel-group convolution on the input feature map through PartialConvolution, performing spatial convolution operations on only some channels and directly retaining the original features of the remaining channels. While significantly compressing channel redundancy and reducing the amount of computation and parameters, it accurately captures the local texture features of minor defects in transmission lines, achieving a balance between lightweight design and local feature extraction capabilities. The core computational process of PartialConvolution is defined as follows: (1) In formula (1): For the input feature map, H, W, and C represent the height, width, and number of channels of the feature map, respectively; For the channel group mask, M c =1 indicates that the c-th channel participates in the convolution operation, M c =0 indicates that the original features of the c-th channel are directly preserved, and the number of groups is set to 4; The kernel for PartialConvolution has a kernel size of k=3 and C. g C represents the number of channels in each group. g =C / 4; Here, σ is the bias term, and σ is the SiLU activation function. This is the output feature map of PartialConvolution; It is a 1 / 4 channel feature transformation branch, which completes the extraction of local texture features of minute defects; X⊙(1 M) represents the residual branch with no computational direct transmission. It is a 3 / 4 channel original feature preservation branch with no convolution / activation operations. It directly transmits the original information of the input features to the fusion layer and is the core of the residual connection. If Mc=1, meaning the c-th channel participates in the convolution operation: Xc⊙Mc=Xc、Xc⊙(1 Mc)=0; The channel feature will enter the convolution operation process of Wpc to complete the extraction of local texture features; If Mc=0, that is, the c-th channel directly retains the original features: Xc⊙Mc=0、Xc⊙(1 Mc)=Xc; In the formula: Xc represents the feature map of the c-th channel in the input feature map X.

[0028] In S3, the global feature fusion of the neck is optimized by embedding the RepViT global perception module in the neck feature pyramid network FPN; The RepViT module is based on a reparameterized convolutional structure. It efficiently models global semantic information by combining deep downsampling with point-to-point convolution, thus overcoming the limitations of FasterNetBlock in local feature extraction. The RepViT module enables bidirectional fusion of local features extracted from the backbone network and global semantic features of the neck, constructing a dual-path feature fusion logic to solve the problem of distinguishing between minute defects and background in complex contexts. The formula for dual-path feature fusion is: (2); In formula (2): This represents a local feature map output by the backbone network. Represents the global semantic feature map output by the RepViT module; feature fusion weight coefficients. By training adaptive learning, the contribution of local texture and global semantics is balanced; This represents the final fused feature map.

[0029] The global semantic modeling process of the RepViT module includes two steps: downsampling and feature recovery. The core calculation is as follows: (3); (4); In equations (3) and (4): Conv2d represents a downsampling convolution with a stride of 2, and ConvTranspose2d represents a transposed convolution with a stride of 2, ensuring that the input and output feature map sizes are consistent.

[0030] In S3, a coordinate attention mechanism, CoordinateAttention, is introduced at the feature fusion node between the backbone network and the neck. The coordinate attention mechanism generates an attention map that combines channel weights and spatial location information by performing global average pooling in the channel dimension and feature encoding in the coordinate dimension on the feature map. This accurately enhances the feature response of small defect areas, suppresses background interference, and improves the spatial positioning accuracy of small targets such as missing pins and microcracks. The calculation process for CoordinateAttention is as follows: Global average pooling along the channel dimension: (5); In formula (5): This represents the global average pooling result for channel c. This represents the pixel value of the fused feature map at position (i,j) and in the c-th channel; Coordinate dimension feature encoding: (6); (7); In equations (6) and (7): These represent the coordinate features in the vertical and horizontal directions, respectively. `Concat` indicates a feature concatenation operation. .

[0031] CoordinateAttention also includes attention weight generation and feature weighting, calculated using the following formula: (8); In equation (8): Here are the weights of the fully connected layer, r=16 is the compression ratio, δ is the ReLU activation function, σ is the Sigmoid activation function, and b1 and b2 are bias terms. This is the feature map enhanced by the attention mechanism.

[0032] In S4, the training parameters are set as follows: BatchSize is set to 16, the initial learning rate is 0.001, and the number of iterations is 100 rounds. The learning rate is adjusted using the cosine annealing algorithm, and the CIoU loss function is used to optimize the bounding box regression accuracy. The learning rate of the cosine annealing algorithm changes as follows: in the early stages of training, when t is small, cos(π·t / T) approaches 1, and η t As the initial learning rate η0 approaches, a large learning rate allows the model to update its parameters quickly, approximating the optimal solution; in the later stages of training, when t approaches T, cos(π·t / T) approaches -1, and η t Smoothly decrease to the minimum learning rate η min A small learning rate allows the model to finely adjust its parameters near the optimal solution, avoiding parameter oscillations caused by an excessively large learning rate, and improving the model's convergence stability and detection accuracy. During training, the network parameters are adjusted in real time using the validation set until the model converges, and the optimal model is saved.

[0033] Finally, the best trained model is quantized using INT8 and deployed to the JetsonOrin edge computing platform on the drone. During the drone inspection, images of the power transmission line are collected in real time and transmitted to the edge platform. The improved model is used for inference detection and outputs the defect type, defect location coordinates and confidence level. The specific expression for INT8 quantization is as follows: (9); In equation (9): η t ηt represents the current learning rate in the t-th iteration, which is the actual learning rate used when the model is trained in the t-th iteration and changes dynamically with the number of iterations; η0 represents the initial learning rate, which is the starting learning rate for model training and is manually configured; ηt min η represents the minimum learning rate, which is the lower limit for the learning rate to decrease, preventing the model training from stalling due to an excessively low learning rate; t represents the current training iteration number, with values ​​of 1, 2, 3, ..., T, where each complete training of the full dataset is counted as one iteration; T represents the total number of training iterations, which is a manually preset total number of training iterations; cos(·) represents the cosine trigonometric function, which implements the learning rate from η0 to η min It exhibits a smooth cosine-like descent, rather than a nonlinear abrupt drop.

[0034] Example 2: This example obtains the detection performance data of the improved network architecture by building a simulation experiment, and compares and analyzes it with the experimental data of the original YOLOv8, the lightweight YOLOv8n, and the improved YOLOv8 that only replaces FasterNet, thereby verifying the superiority and practical application value of the improved network architecture proposed in this invention.

[0035] I. Experimental Environment: The training end uses an NVIDIA RTX 4090 GPU, an Intel Core i9-13900K CPU, and 64GB of memory; the deployment end uses a Jetson Orin NX edge computing platform mounted on a drone, with a computing power of 200 TOPS and 16GB of memory. The operating system is Ubuntu 22.04LTS, the deep learning framework is PyTorch 2.0, the programming language is Python 3.9, the data annotation tool is LabelImg 1.8.6, and the image processing library is OpenCV 4.8.0.

[0036] II. Dataset Details: The dataset contains 12,000 images of power transmission lines inspected by drones, with a resolution of 1920×1080 pixels. Among these, 4,500 images show missing pins, 5,000 show microcracks, and 2,500 are defect-free. After data augmentation, the dataset size was expanded to 24,000 images: 16,800 for training, 4,800 for validation, and 2,400 for testing. During annotation, bounding boxes were drawn centered on the defect area, and labels were divided into two categories: missing pins (pin_missing) and microcracks (micro_crack).

[0037] III. Model Parameter Settings: The FasterNetBlock parameters are configured using FasterNet-T0, with a PartialConvolution kernel size of 3×3, 4 channel groups, and SiLU activation function. The original YOLOv8 backbone network's four C2f modules are replaced, each corresponding to feature extraction at different scales. The RepViT module parameters are embedded in the middle and top layers of the neck FPN, adopting the RepViT-M structure, with a downsampling stride of 2, a point-to-point convolution channel count of 256, and an FFN hidden layer dimension of 1024. The CoordinateAttention parameter has a compression ratio of r=16, a kernel size of 1×1, and is embedded at the fusion node of the output of the 3rd and 4th layers of the backbone network with the neck feature. IV. Experimental Results and Analysis: This embodiment was compared with the original YOLOv8, the lightweight YOLOv8n, and the improved YOLOv8 that only replaced FasterNet. The evaluation metrics included accuracy, recall, mean precision (mAP@0.5), number of parameters, and speed of inference per frame. The experimental results are shown in the table below:

[0038] The experimental results show that, under the premise that the number of parameters and inference time are close to those of the lightweight model, the method of the present invention improves mAP@0.5 by 2.4 percentage points compared with the original YOLOv8 and by 7.7 percentage points compared with the lightweight YOLOv8n. It achieves the triple goals of lightweight, high precision and fast speed, and can effectively detect minute defects in transmission lines.

Claims

1. A lightweight method for detecting minute defects in transmission lines that integrates FasterNet and RepViT, characterized in that... Includes the following steps: S1: Collect images of power transmission lines under different lighting conditions, backgrounds, and defect types obtained from UAV inspections, and manually annotate the defect areas of the power transmission line image samples to generate annotation files; S2: Perform data augmentation on the dataset and divide the processed dataset into training set, validation set and test set in a 7:2:1 ratio; S3: Reconstruct the YOLOv8 network architecture, make lightweight improvements to the backbone and neck networks, introduce an attention mechanism, and form an improved detection network. The overall YOLOv8 improved detection network architecture consists of three parts: feature extraction layer, feature fusion layer, and defect detection layer. S4: Input the preprocessed dataset into the improved detection network described above, and train the model based on the PyTorch framework; S5: Lightweight and quantize the trained optimal model and deploy it to the JetsonOrin edge computing platform carried by the drone.

2. The lightweight transmission line micro-defect detection method integrating FasterNet and RepViT according to claim 1, characterized in that: In S1, the defect area is manually annotated using the LabelImg tool to generate an XML format annotation file.

3. The lightweight transmission line micro-defect detection method integrating FasterNet and RepViT according to claim 1, characterized in that: In S2, data augmentation processes include random cropping, brightness and contrast adjustment, horizontal flipping, and Gaussian noise addition, which expand the dataset size and improve the model's generalization ability.

4. The lightweight transmission line micro-defect detection method integrating FasterNet and RepViT according to claim 1, characterized in that: In S3, lightweight reconstruction of the backbone network specifically includes: The original YOLOv8 C2f module was removed and replaced with FasterNetBlock, which integrates partial convolution, as the core feature extraction unit. FasterNetBlock performs channel-group convolution on the input feature map through PartialConvolution, performing spatial convolution operations on only some channels and directly retaining the original features of the remaining channels. While significantly compressing channel redundancy and reducing the amount of computation and parameters, it accurately captures the local texture features of minor defects in transmission lines, achieving a balance between lightweight design and local feature extraction capabilities. The core computational process of PartialConvolution is defined as follows: ;(1) In formula (1): For the input feature map, H, W, and C represent the height, width, and number of channels of the feature map, respectively; For the channel group mask, M c =1 indicates that the c-th channel participates in the convolution operation, M c =0 indicates that the original features of the c-th channel are directly preserved, and the number of groups is set to 4; Here, k represents the kernel size of PartialConvolution, and C represents the kernel size. g Indicates the number of channels in each group; Here, σ is the bias term, and σ is the SiLU activation function. This is the output feature map of PartialConvolution.

5. The lightweight transmission line micro-defect detection method integrating FasterNet and RepViT according to claim 1, characterized in that: In S3, the global feature fusion of the neck is optimized by embedding the RepViT global perception module in the neck feature pyramid network FPN; The RepViT module is based on a reparameterized convolutional structure. It efficiently models global semantic information by combining deep downsampling with point-to-point convolution, thus overcoming the limitations of FasterNetBlock in local feature extraction. The RepViT module enables bidirectional fusion of local features extracted from the backbone network and global semantic features of the neck, constructing a dual-path feature fusion logic to solve the problem of distinguishing between minute defects and background in complex contexts. The formula for dual-path feature fusion is: (2); In formula (2): This represents a local feature map output by the backbone network. This represents the global semantic feature map output by the RepViT module; The feature fusion weight coefficients are used to balance the contributions of local texture and global semantics through adaptive learning during training. This represents the final fused feature map.

6. The lightweight transmission line micro-defect detection method integrating FasterNet and RepViT according to claim 5, characterized in that: The global semantic modeling process of the RepViT module includes two steps: downsampling and feature recovery. The core calculation is as follows: (3); (4); In equations (3) and (4): Conv2d represents a downsampling convolution with a stride of 2, and ConvTranspose2d represents a transposed convolution with a stride of 2, ensuring that the input and output feature map sizes are consistent.

7. The lightweight transmission line micro-defect detection method integrating FasterNet and RepViT according to claim 1, characterized in that: In S3, a coordinate attention mechanism, CoordinateAttention, is introduced at the feature fusion node between the backbone network and the neck. The coordinate attention mechanism generates an attention map that combines channel weights and spatial location information by performing global average pooling in the channel dimension and feature encoding in the coordinate dimension on the feature map. This accurately enhances the feature response of small defect areas, suppresses background interference, and improves the spatial positioning accuracy of small targets such as missing pins and microcracks. The calculation process for CoordinateAttention is as follows: Global average pooling along the channel dimension: (5); In formula (5): This represents the global average pooling result for channel c. This represents the pixel value of the fused feature map at position (i,j) and in the c-th channel; Coordinate dimension feature encoding: (6); (7); In equations (6) and (7): These represent the coordinate features in the vertical and horizontal directions, respectively. `Concat` indicates a feature concatenation operation. .

8. The lightweight transmission line micro-defect detection method integrating FasterNet and RepViT according to claim 7, characterized in that: CoordinateAttention also includes attention weight generation and feature weighting, calculated using the following formula: (8); In equation (8): Here, r represents the compression ratio, δ is the ReLU activation function, σ is the Sigmoid activation function, and b1 and b2 are bias terms. This is the feature map enhanced by the attention mechanism.

9. The lightweight transmission line micro-defect detection method integrating FasterNet and RepViT according to claim 1, characterized in that: In S4, training parameters include batch size and initial learning rate. The learning rate is adjusted using the cosine annealing algorithm, and the CIoU loss function is used to optimize the bounding box regression accuracy. During training, the network parameters are adjusted in real time using the validation set until the model converges, and the optimal model is saved.

10. A lightweight method for detecting minute defects in transmission lines integrating FasterNet and RepViT as described in claim 1, characterized in that: Finally, the best trained model is quantized using INT8 and deployed to the JetsonOrin edge computing platform on the drone. During the drone inspection, images of the power transmission line are collected in real time and transmitted to the edge platform. The improved model is used for inference detection, and the defect type, defect location coordinates, and confidence level are output.