An electric power operation and maintenance equipment image processing method

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By constructing a large model based on the Transformer architecture and combining it with knowledge distillation technology, the problem of insufficient feature extraction and generalization ability of small models in power operation and maintenance equipment image processing is solved, realizing efficient and accurate fault detection and real-time operation and maintenance of power grid equipment.

CN122243864APending Publication Date: 2026-06-19INFORMATION & COMM COMPANY OF QINGHAI ELECTRIC POWER

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: INFORMATION & COMM COMPANY OF QINGHAI ELECTRIC POWER
Filing Date: 2026-01-30
Publication Date: 2026-06-19

Application Information

Patent Timeline

30 Jan 2026

Application

19 Jun 2026

Publication

CN122243864A

IPC: G06T7/00; G06V10/82; G06N3/0455; G06N3/0895

AI Tagging

Application Domain

Image analysis Biological models

Technology Topics

Imaging processing Feature extraction

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Child caries risk dynamic early warning method based on multi-modal data fusion
CN122245811AMedical communication Medical data mining Imaging processingMultimodal data
Computility setting method and image processing system
US20260170703A12D-image generation Processor architectures/configuration Imaging processing Computer graphics (images)
High efficiency garment trademark printing dot positioning and hole cutting process
CN119061671BSevering textiles Other printing apparatus Imaging processing Robotic arm
Abrasion tester device enhanced by artificial intelligence and image processing
WO2026127915A1Image analysis Biological neural network models Imaging processingWear testing
Image-enhancement-based color masterbatch coloring performance detection method and system
CN122243876ARealize closed-loop quality controlHigh precision Image enhancement Image analysis Pattern recognition Imaging processing

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing image processing methods for power operation and maintenance equipment rely on small models, which have limited feature extraction capabilities, weak generalization capabilities, low creation efficiency, and poor adaptability. These methods are difficult to meet the complex and diverse operation and maintenance needs of the power grid, resulting in low recognition accuracy, high false detection rate, and inability to guarantee the reliability and safety of power grid operation.

Method used

A large model based on the Transformer architecture is used for pre-training. A small image processing model is constructed through self-supervised learning and transfer learning of masked image modeling. Knowledge distillation technique is combined for refined training to improve the robustness and generalization ability of the model and achieve efficient fault detection.

Benefits of technology

It improves the model's recognition accuracy and creation efficiency, enhances the model's real-time operation capability on edge devices, and ensures the accuracy and reliability of real-time operation and maintenance monitoring and fault detection of power grid equipment.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122243864A_ABST

Patent Text Reader

Abstract

This invention discloses an image processing method for power grid operation and maintenance equipment. The method includes: Step S1: acquiring and preprocessing training data of power grid equipment images; Step S2: constructing a large visual model based on the Transformer architecture and pre-training it; Step S3: extracting intermediate features from the pre-trained large model, constructing a small image processing model using transfer learning, learning the feature extraction and processing capabilities of the large model, and performing automated incremental training; Step S4: applying knowledge distillation technology to refine the training of the small model for power grid image processing tasks; Step S5: analyzing real-time image data of power grid equipment through the small image processing model to perform real-time operation and maintenance monitoring and fault detection of the power grid equipment. This invention effectively improves the accuracy and reliability of fault detection through masked image modeling and large model-driven small model design and fine-tuning, enabling more intelligent and efficient image and video data processing and applications.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of power operation and maintenance technology, and in particular to an image processing method for power operation and maintenance equipment. Background Technology

[0002] Power system operation and maintenance (O&M) is a crucial link in ensuring the safe, efficient, and stable operation of the power system. It reduces the risk of accidents by guaranteeing the stability of power supply and the efficiency of equipment operation, and improves the reliability of the power system through regular inspections, maintenance, and intelligent management. Scientific O&M not only effectively extends equipment lifespan and optimizes resource utilization, but also supports green development through energy conservation, emission reduction, and other measures, thereby improving overall economic benefits. With the emergence and development of the smart grid concept, the amount of data in the field of power equipment O&M is growing exponentially, and the demand for high-precision, high-efficiency analysis and processing of large-scale, complex data in the power system is becoming increasingly urgent. Against this backdrop, how to extract valuable information from massive amounts of data through advanced artificial intelligence technologies and machine learning algorithms, and achieve precise O&M through intelligent decision-making, has become a major challenge and opportunity for the power industry.

[0003] Existing image processing methods for power operation and maintenance equipment primarily rely on small models. While these models offer advantages in efficiency and cost, their inherent limitations also present several challenges. First, small models have limited feature extraction capabilities, failing to effectively capture deep details in images. They may be unable to accurately extract minute cracks on lines, minor damage to insulators, or slight corrosion, leading to low recognition accuracy, particularly in complex image vision tasks. Second, small models have weak generalization ability, struggling to handle noise and outliers. They often exhibit instability in new environments or abnormal conditions, resulting in a high false detection rate. Furthermore, the creation of small models is inefficient, especially when involving large amounts of data annotation and model training, requiring significant time and human resources. On the other hand, both the expansion of transmission networks and the increase in equipment types necessitate models with good scalability and adaptability. However, small models frequently experience accuracy degradation when adapting to new equipment types or monitoring requirements, highlighting their limitations. As the complexity of power systems and the volume of data continue to increase, relying solely on small models is insufficient to comprehensively address the increasingly complex and diverse operation and maintenance needs of the future power grid, failing to guarantee the reliability and safety of power grid operation. Summary of the Invention

[0004] The technical problem to be solved by the embodiments of the present invention is to provide an image processing method for power operation and maintenance equipment to ensure the reliability and safety of power grid operation.

[0005] To address the aforementioned technical problems, this invention provides an image processing method for power operation and maintenance equipment, comprising: Step S1: Obtain training data of power grid equipment images and preprocess them to construct an unlabeled large-scale dataset and a labeled small dataset; Step S2: Construct a large visual model based on the Transformer architecture, and pre-train the large model using images from an unlabeled large-scale dataset; Step S3: Extract intermediate features from the pre-trained large model, construct a small image processing model using transfer learning, learn the feature extraction and processing capabilities of the large model, and perform automated incremental training on the small model using images from a labeled small dataset. Step S4: Apply knowledge distillation technology to refine the small model for the power grid image processing task, so that the small model can imitate the output distribution of the large model; Step S5: Analyze the real-time image data of the power grid equipment using a small image processing model that has completed knowledge distillation, and perform real-time operation and maintenance monitoring and fault detection of the power grid equipment.

[0006] The beneficial effects of this invention are as follows: 1) This invention uses a self-supervised learning approach for masked image modeling to train the model, enabling the model to learn global image features while also focusing on detailed information, thereby enhancing the model's robustness and generalization ability.

[0007] 2) This invention introduces a large model, which, with its powerful deep learning capabilities, can quickly generate high-performance small models through transfer learning and a small amount of data, thereby significantly reducing the cost of creating small models and improving creation efficiency.

[0008] 3) This invention extracts intermediate features from a pre-trained large model and performs automated incremental training on the created small model to help it learn the feature extraction and processing capabilities of the large model, thereby improving the model's recognition accuracy and ensuring that the model can run in real time on edge devices.

[0009] 4) This invention utilizes knowledge distillation technology to refine the training of small image processing models by representing complex data from large models, extracting high-dimensional features, and capturing deep-level features, thereby improving their practicality and further optimizing image processing capabilities. Attached Figure Description

[0010] Figure 1 This is a schematic diagram of the overall framework structure of the image processing method for power operation and maintenance equipment according to an embodiment of the present invention. Detailed Implementation

[0011] It should be noted that, unless otherwise specified, the embodiments and features described in this application can be combined with each other. The present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments.

[0012] In this embodiment of the invention, directional indicators (such as up, down, left, right, front, back, etc.) are only used to explain the relative positional relationship and movement of each component in a specific posture (as shown in the figure). If the specific posture changes, the directional indicator will also change accordingly.

[0013] Furthermore, in this invention, descriptions involving "first," "second," etc., are for descriptive purposes only and should not be construed as indicating or implying their relative importance or implicitly specifying the number of technical features indicated. Therefore, a feature defined with "first" or "second" may explicitly or implicitly include at least one of those features.

[0014] Please refer to Figure 1 The image processing method for power operation and maintenance equipment in this embodiment of the invention includes steps S1 to S5.

[0015] Step S1: Obtain training data of power grid equipment images and preprocess them to construct an unlabeled large-scale dataset and a labeled small dataset.

[0016] Images of the equipment, acquired using both visible light and non-visible light devices (such as infrared cameras), are collected to build a dataset. Data acquisition needs to incorporate historical fault cases, including images of various types of power grid equipment, such as equipment wear, line damage, and thermal imaging images. Furthermore, by combining historical fault cases, images of both normal and abnormal equipment states are obtained to ensure the dataset covers a wide range of operating conditions.

[0017] In the data preprocessing stage, the original images are first subjected to a series of data augmentation operations such as cropping, rotation, and scaling to simulate different visual conditions and improve the model's adaptability to different environments. Next, Gaussian filtering and median filtering are applied to remove noise and improve image clarity; for infrared images, adaptive smoothing filtering is used to reduce sensor noise. Then, image pixels are normalized, and image size, format, and color channels are standardized to ensure the quality and consistency of the input data, thereby improving the effectiveness of subsequent training and the performance of the model. Finally, manual annotation is used to label a portion of the dataset, resulting in a large-scale unlabeled dataset and a small labeled dataset.

[0018] Step S2: Construct a large visual model based on the Transformer architecture, and use the masked image modeling method to pre-train the large model using images from an unlabeled large-scale dataset.

[0019] Specifically, step S2 can be broken down into the following sub-steps: Step S21: Construct a large-scale visual model based on the Transformer architecture as the pre-trained network structure. This type of network, through its self-attention mechanism, can capture long-range dependencies in images, optimizing the accurate detection of device faults and the identification of abnormal states in complex scenes. Simultaneously, the multi-layered stacked structure enhances the model's expressive power, enabling it to process high-dimensional image data and perform accurate analysis. The specific network structure is as follows: 1) Input Layer: Input image and process it in blocks. Let the input image size be... ,in It is the image height. It is the image width. It is the number of image channels (for RGB images). The value is 3). Next, the image is divided into multiple fixed-size segments. The block, get Patches. Each image patch is flattened into a one-dimensional vector and mapped to a high-dimensional feature space through linear projection.

[0020] 2) Patch Embedding: Flattening each image patch into a single image patch. The vectors of each image patch are mapped to a high-dimensional space using a linear layer, resulting in the embedding vector for each patch. Assume each patch is embedded into a... In a 3D space, each image patch is represented as a... Dimensional vector.

[0021] 3) Position Encoding: Position encoding is added to each image patch to preserve spatial structure information. Position encoding is implemented through learnable vectors, helping the model understand the relative position of the image patch in the original image. The feature vectors of the image patches after position encoding form a sequence, which is then input into the Transformer encoder.

[0022] 4) Transformer Encoder: Composed of multiple stacked standard Transformer encoders, each layer containing a multi-head self-attention mechanism and a feedforward neural network. The multi-head self-attention mechanism allows the model to capture different relationships and features from multiple subspaces, thus gaining a more comprehensive understanding of image content. The feedforward neural network further processes the output of the self-attention layers, enhancing the expressive power of features. Residual connections and layer normalization are applied to the output of each self-attention and feedforward network layer to ensure that information is not lost during training and to help stabilize the model's learning process.

[0023] Step S22: Pre-train the model using self-supervised learning in Masked Image Modeling (MIM). For images in the input unlabeled large-scale dataset, a preset proportion (preferably 10%~50%) of blocks are randomly selected for masking during block processing, and the occluded regions are processed using discretization reconstruction, i.e., the image is converted into discrete tokens in the visual codebook. The model needs to learn the contextual information of the image structure and content, and reconstruct the masked regions based on the unmasked parts, i.e., predict the tokens of the occluded regions, so that while learning the global features of the image, it can also focus on the restoration of details and understand the semantics and structure of the power grid equipment. For the masked regions, the cross-entropy loss is used to calculate the error between the model's predicted value and the true value, and the weighted average of the losses of all masked regions is used as the final loss. The cross-entropy loss can be expressed as: ; in, It is the total number of categories, i.e., the size of the visual codebook. It is the One-Hot vector of the real label. It is a model for categories The predicted probability.

[0024] Step S3: Extract intermediate features from the pre-trained large model, construct a small image processing model using transfer learning, learn the feature extraction and processing capabilities of the large model, and use images from the labeled small dataset to perform automated incremental training on the small model to improve the recognition accuracy of the small model.

[0025] To achieve efficient inference in resource-constrained environments, small image processing models can choose lightweight convolutional neural networks (CNNs) designed for low computational resource consumption as their basic structure. The knowledge learned by the large model during training on the source task, especially the weights of convolutional layers and some fully connected layers, is transferred to the corresponding layers of the small model as its initial weights. The specific structure is as follows: 1) Depthwise separable convolutional layers: These consist of multiple convolutional layers, each followed by the ReLU activation function to reduce computational cost; 2) Input image resolution: After several convolutional layers, the final output is a high-dimensional feature vector.

[0026] As one implementation method, automated incremental training of small models is performed.

[0027] 1) Training Data Preparation: Intermediate features, containing semantic information of the image (such as edges and textures), are extracted from the pre-trained large model, providing effective learning signals for the small model. Simultaneously, a labeled small dataset is used to provide label information for supervised learning, helping the small model better understand the target task. The supervised labeled small dataset is divided into a training set (70%), a validation set (15%), and a test set (15%) to ensure the model's generalization ability. The extracted intermediate features are concatenated with the labeled small dataset to form the final training dataset.

[0028] 2) Automated Incremental Training: In the initial stage of incremental training, some layers of the small model (such as the initial convolutional layer) are frozen to prevent interference with its feature learning in the early stages of training, thereby helping the small model to better inherit the learning results of the large model on low-level features. As training progresses, more layers of the small model (such as intermediate convolutional layers, fully connected layers, etc.) are gradually unfrozen, allowing the small model to gradually participate in higher-level feature learning, ensuring that the small model can learn the knowledge transferred from the large model more comprehensively.

[0029] 3) Gradually optimize hyperparameters: As training continues, gradually improve the performance of the small model by fine-tuning hyperparameters such as learning rate, batch size, and regularization parameters. This helps the small model converge more effectively, avoids overfitting, and ensures that it can achieve good results on a small amount of data.

[0030] Step S4: Apply knowledge distillation technology to refine the small model for the power grid image processing task, allowing the small model to mimic the output distribution of the large model and further learn the reasoning ability and decision-making process of the large model.

[0031] During the distillation process, the smaller model will adjust its parameters by comparing the output feature map of the larger visual model, so that its feature map is close to the high-level features of the larger model.

[0032] 1) Large model generates soft labels: After the large model is trained, for each input image, soft labels (i.e., the predicted probability distribution of the large model) are generated using the large model. These soft labels contain the relative probability information of the large model for each category, providing more semantic information than hard labels.

[0033] 2) Define the distillation loss function: Using the output distribution of the large model (soft label) and the output distribution of the small model, define the distillation loss function, which consists of Kullback-Leibler divergence and cross-entropy loss, as follows: ; Where KL is the Kullback-Leibler divergence, used to measure the output distribution of the small model. soft labels generated by large models The difference between them; CE is the cross-entropy loss, used to measure the output distribution of the small model relative to the hard labels. The differences between them. It is a moderating factor that determines the weights of soft-label loss and hard-label loss, and is used to balance the effects of the two.

[0034] 3) Knowledge Distillation Training of Small Models: During the distillation process, the small model is trained based on the soft labels of the large model by minimizing the loss function, gradually optimizing its weights to simulate the reasoning process of the large model. The training objective is to make the output distribution of the small model as close as possible to the output distribution of the large model, so that it can benefit from the knowledge of the large model while maintaining efficient reasoning ability.

[0035] 4) Model Evaluation and Optimization: After knowledge distillation is implemented, the model is evaluated using a test set, focusing on its performance in inference speed, accuracy, and computational resource consumption. Based on the evaluation results, techniques such as model pruning and quantization can be used to further optimize the model, reducing its computational and storage requirements, so that it can be better deployed in practical applications.

[0036] Step S5: Analyze real-time image data of power grid equipment using a small image processing model trained through knowledge distillation to perform real-time operation and maintenance monitoring and fault detection. After knowledge distillation, the optimized small model will be deployed to edge devices to achieve real-time operation and maintenance monitoring and fault detection of power grid equipment, improve fault response efficiency, and reduce computing resource consumption.

[0037] Specifically, step S5 can be broken down into the following sub-steps: Step S51: Edge Device Deployment. Deploy the optimized small model to edge computing devices and integrate it into the power grid equipment inspection system. Edge devices may include fixed camera terminals, drone inspection systems, or other intelligent terminals with computing capabilities. The core tasks of edge devices include: 1) Real-time acquisition of image data from power grid equipment, followed by preprocessing such as image denoising and normalization; 2) Call the small model to perform inference, identify the fault type, and obtain fault location information; 3) Based on the detection results, store them locally or directly trigger the early warning mechanism.

[0038] Step S52: Fault Detection and Data Output. The small model analyzes the input power grid equipment image and outputs the detection results, including but not limited to: 1) Fault category: such as insulator damage, conductor breakage, icing, foreign object hanging, etc.; 2) Fault location information: target detection box coordinates (x_min, y_min, x_max, y_max) or semantic segmentation mask; 3) Fault confidence: Indicates the reliability of the detection results, with a value range of [0,1]; 4) Timestamp: Records the detection time for subsequent analysis.

[0039] Step S53: Fault Information Transmission and Alarm Feedback. The edge device can process the detected fault information in the following ways: 1) High-confidence faults (confidence > 0.9) can directly trigger local alarms, such as buzzer warnings and LED indicator reminders, so that inspection personnel can respond quickly; 2) Upload the detection results to the power grid operation and maintenance management system via MQTT, HTTP or WebSocket protocols, and combine them with the GIS system for fault location; 3) Managers can access the inspection report via the web or mobile device, view the fault details, and remotely schedule maintenance tasks to ensure the safe operation of power grid equipment.

[0040] This invention effectively improves the accuracy and reliability of fault detection through masked image modeling and large-model-driven small-model design and fine-tuning, enabling more intelligent and efficient image and video data processing and applications. This framework has broad applicability, not only in the power industry but also in other fields requiring the processing of large-scale complex data, such as transportation, energy, and manufacturing, achieving cross-industry applications and industrialization. High-precision and high-efficiency analysis and processing of large-scale complex data will help promote the deep integration of artificial intelligence and big data technologies in the power industry, drive the development of related industrial chains, and create new economic growth points.

[0041] The power operation and maintenance equipment image processing system of this invention includes: Dataset construction module: Acquires and preprocesses training data of power grid equipment images to construct unlabeled large-scale datasets and labeled small datasets; Large Model Building Module: Constructs a large visual model based on the Transformer architecture, and pre-trains the large model using images from a large-scale unlabeled dataset; Small model building module: Extracts intermediate features from the pre-trained large model, uses transfer learning to build a small image processing model, learns the feature extraction and processing capabilities of the large model, and uses images from a labeled small dataset to perform automated incremental training on the small model; Knowledge distillation module: Applying knowledge distillation technology, the small model is finely trained for power grid image processing tasks, so that the small model can imitate the output distribution of the large model; Monitoring module: By analyzing real-time image data of power grid equipment through a small image processing model that performs knowledge distillation, the module enables real-time operation and maintenance monitoring and fault detection of power grid equipment.

[0042] As one implementation method, the large model building module uses a self-supervised learning approach for masked image modeling to pre-train the model. For images in the input unlabeled large-scale dataset, a preset proportion of blocks are randomly selected for masking during block processing, and discretization reconstruction is used to process the occluded regions, that is, the image is converted into discrete tokens in the visual codebook. Specifically, for the masked regions, cross-entropy loss is used to calculate the error between the large model's predicted value and the true value, and the weighted average of the losses for all masked regions is used as the final loss. The cross-entropy loss is expressed as: ; in, It is the total number of categories, i.e., the size of the visual codebook. It is the One-Hot vector of the real label. The large model is for categories The predicted probability.

[0043] As one implementation method, the small model building module performs automated incremental training according to the following steps: 1) Training data preparation: Extract intermediate features from the pre-trained large model, and concatenate the extracted intermediate features with the labeled small dataset to form the final training dataset; 2) Automated incremental training: In the initial stage of incremental training, some layers of the small model are frozen; as training progresses, more layers of the small model are gradually unfrozen, allowing the small model to gradually participate in higher-level feature learning. 3) Gradually optimize hyperparameters: As training continues, fine-tune the hyperparameters of the small model to gradually improve its performance.

[0044] As one implementation method, the knowledge distillation module optimizes the image processing mini-model according to the following steps: 1) Large model generates soft labels: After the large model is pre-trained, soft labels are generated for each input image using the large model; 2) Define the distillation loss function: Using the output distributions of the large model and the small model, define the distillation loss function, which consists of the Kullback-Leibler divergence and cross-entropy loss, as follows: ; Where KL is the Kullback-Leibler divergence, used to measure the output distribution of the small model. soft labels generated by large models The difference between them; CE is the cross-entropy loss, used to measure the output distribution of the small model relative to the hard labels. The differences between them; It is a regulatory factor; 3) Knowledge distillation training of small models: During the distillation process, the small model is trained by minimizing the loss function and based on the soft labels of the large model, and its weights are gradually optimized to simulate the reasoning process of the large model. 4) Model evaluation and optimization: After knowledge distillation is implemented, the small model is evaluated using a test set. Based on the evaluation results, the model is further optimized to reduce the computational load and storage requirements.

[0045] As one implementation method, the monitoring module performs real-time operation and maintenance monitoring and fault detection according to the following steps: 1) Real-time acquisition and preprocessing of image data of power grid equipment via edge devices; 2) Call the image processing mini-model that has completed knowledge distillation to perform reasoning, identify the fault type, and obtain fault location information; 3) Based on the detection results, local storage can be performed via edge devices or an early warning mechanism can be triggered directly.

[0046] This invention first acquires and preprocesses image data of power grid equipment. Next, a large-scale visual model based on the Transformer architecture is constructed and pre-trained on a large-scale unlabeled dataset using a masked image modeling method. Then, by extracting intermediate features from the pre-trained large model, a small image processing model is constructed using transfer learning. This model learns the feature extraction and processing capabilities of the large model and undergoes automated incremental training using a labeled small dataset to improve its recognition accuracy. Then, knowledge distillation technology is applied for refined training specifically for power grid image processing tasks. This allows the small model to learn the reasoning ability and decision-making process of the large model by mimicking its output distribution, further optimizing its image processing capabilities. Finally, the small model is deployed on edge devices for daily real-time fault detection. This invention solves the technical problem that existing technologies are limited by the principles of small-scale model technology, resulting in insufficient accuracy, coverage, and practicality of existing models.

[0047] Although embodiments of the invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made to these embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the appended claims and their equivalents.

Claims

1. An electric power operation and maintenance equipment image processing method, characterized in that, include: Step S1: Obtain training data of power grid equipment images and preprocess them to construct an unlabeled large-scale dataset and a labeled small dataset; Step S2: Construct a large visual model based on the Transformer architecture, and pre-train the large model using images from an unlabeled large-scale dataset; Step S3: Extract intermediate features from the pre-trained large model, construct a small image processing model using transfer learning, learn the feature extraction and processing capabilities of the large model, and perform automated incremental training on the small model using images from a labeled small dataset. Step S4: Apply knowledge distillation technology to refine the small model for the power grid image processing task, so that the small model can imitate the output distribution of the large model; Step S5: Analyze the real-time image data of the power grid equipment using a small image processing model that has completed knowledge distillation, and perform real-time operation and maintenance monitoring and fault detection of the power grid equipment.

2. The power operation and maintenance equipment image processing method of claim 1, wherein, In step S2, the model is pre-trained using a self-supervised learning approach for masked image modeling. For images in the input unlabeled large-scale dataset, a preset proportion of blocks are randomly selected for masking during block processing, and discretization reconstruction is used to process the occluded regions, that is, the images are converted into discrete tokens in the visual codebook. For the masked regions, cross-entropy loss is used to calculate the error between the model's predicted values and the true values. The weighted average of the losses for all masked regions is used as the final loss. The cross-entropy loss is expressed as: ； where, is the total number of classes, i.e., the size of the visual codebook, is the One-Hot vector of the true label, is the predicted probability of the class by the large model. 3.The power operation and maintenance equipment image processing method of claim 1, wherein, In step S3, automated incremental training is performed according to the following steps: 1) Training data preparation: Extract intermediate features from the pre-trained large model, and concatenate the extracted intermediate features with the labeled small dataset to form the final training dataset; 2) Automated incremental training: In the initial stage of incremental training, some layers of the small model are frozen; as training progresses, more layers of the small model are gradually unfrozen, allowing the small model to gradually participate in higher-level feature learning. 3) Gradually optimize hyperparameters: As training continues, fine-tune the hyperparameters of the small model to gradually improve its performance. 4.The power operation and maintenance equipment image processing method of claim 1, wherein, Step S4 includes the following sub-steps: 1) Large model generates soft labels: After the large model is pre-trained, soft labels are generated for each input image using the large model; 2) Define the distillation loss function: Using the output distributions of the large model and the small model, define the distillation loss function, which consists of the Kullback-Leibler divergence and cross-entropy loss, as follows: ； where KL is the Kullback-Leibler divergence, used to measure the difference between the output distribution of the small model and the soft labels generated by the large model; CE is the cross-entropy loss, used to measure the difference between the output distribution of the small model and the hard labels ; is a tuning factor. 3) Knowledge distillation training of small models: During the distillation process, the small model is trained by minimizing the loss function and based on the soft labels of the large model, and its weights are gradually optimized to simulate the reasoning process of the large model. 4) Model Evaluation and Optimization: After knowledge distillation is implemented, the small model is evaluated using data from the labeled small dataset. Based on the evaluation results, the model is further optimized to reduce the computational load and storage requirements.

5. The image processing method for power operation and maintenance equipment as described in claim 1, characterized in that, Step S5 includes the following sub-steps: 1) Real-time acquisition and preprocessing of image data of power grid equipment via edge devices; 2) Call the image processing mini-model that has completed knowledge distillation to perform reasoning, identify the fault type, and obtain fault location information; 3) Based on the detection results, local storage can be performed via edge devices or an early warning mechanism can be triggered directly.