Power transmission line multi-fittings detection method based on spatial knowledge fusion
By using a spatial knowledge fusion approach and leveraging RPN and DoubleHead networks for hardware inspection, the problems of low efficiency and low accuracy in traditional methods are solved, enabling precise inspection of transmission line hardware and adapting to efficient inspection in complex environments.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SGCC GENERAL AVIATION
- Filing Date
- 2023-06-26
- Publication Date
- 2026-06-23
AI Technical Summary
Traditional manual inspection methods are inefficient and prone to omissions and misjudgments in the inspection of power transmission line fittings. Furthermore, when using drones for inspection, the detection of fittings targets relies on individual features and ignores rich contextual information, resulting in low detection accuracy.
A spatial knowledge fusion-based approach is adopted, which uses the RPN network to obtain the region of interest, combines spatial bounding box setting and DoubleHead network for classification and regression, and uses a spatial context extraction model and Soft-NMS for post-processing to achieve accurate detection of hardware.
It improves the accuracy and efficiency of hardware inspection, reduces missed and false detections, and adapts to the needs of power transmission line inspection in complex environments.
Smart Images

Figure CN116805379B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of power technology, and in particular relates to a method for detecting multiple fittings in transmission lines based on spatial knowledge fusion. Background Technology
[0002] As a crucial infrastructure element of the power grid in the energy internet, transmission lines are responsible for transmitting electricity, and their operational status directly impacts social production and people's lives. Due to the demand for electricity transmission in remote areas, transmission lines are often built in regions with harsh natural environments, such as forests, snowfields, or deserts. As vital electrical components on transmission lines, fittings are susceptible to corrosion, dirt, and damage due to the complex environment and severe weather conditions, leading to transmission line faults, even operational shutdowns, and significant economic losses. Therefore, regular inspection of fittings is a critical task in transmission line inspection and a necessary prerequisite for ensuring the safety and stability of transmission line operations.
[0003] With the rapid advancement of urbanization in my country, the coverage of power transmission lines is becoming increasingly extensive, leading to a surge in the workload of transmission line inspection. However, the high-risk working environment of traditional manual inspections has resulted in slow growth in the number of grassroots maintenance personnel. Furthermore, the inefficient nature of manual inspection methods has made them increasingly unable to meet the demands of the rapidly expanding transmission line inspection network. Therefore, there is an urgent need for efficient and low-risk inspection methods. The widespread adoption of drone inspection technology on transmission lines has greatly alleviated the offline burden on maintenance personnel. However, during drone inspections of transmission line components, especially for the vast number of transmission line fittings, a massive amount of corresponding inspection images will be generated. Using online manual visual inspection not only consumes significant human resources but is also prone to missed or incorrect assessments due to fatigue. Therefore, it is necessary to introduce intelligent inspection technology for transmission line components. Accurate detection of fittings is a crucial prerequisite for fitting condition analysis and fault detection. Summary of the Invention
[0004] In view of this, embodiments of the present invention provide a method for detecting multiple fittings on transmission lines based on spatial knowledge fusion, so as to accurately detect fittings on transmission lines.
[0005] A first aspect of this invention provides a method for detecting multiple fittings in transmission lines based on spatial knowledge fusion, comprising:
[0006] Acquire images of the hardware to be inspected and input them into a preset feature extraction network to extract basic features;
[0007] The basic features are input into the RPN network to obtain the region of interest;
[0008] A bounding box is defined for the region of interest, and the region of interest and the bounding box are projected onto the corresponding feature level and ROIAligned to obtain the feature map of the region of interest and the feature map of the bounding box.
[0009] The spatial context information of the spatial bounding box feature map is extracted by a preset spatial context extraction model. Based on the spatial context information and the region of interest feature map, the DoubleHead network is used for classification and regression to obtain the detection results of the hardware in the image to be detected.
[0010] In conjunction with the first aspect, one possible implementation of the first aspect involves defining a spatial bounding box for the region of interest, including:
[0011] The position of the candidate frame for the fittings is determined based on the boundary of the region of interest. Based on the position of the candidate frame for the fittings, the position of the spatial frame corresponding to each candidate frame for the fittings is calculated using the first formula.
[0012] The first formula is:
[0013]
[0014] in, This refers to the position of the space frame; This indicates the position of each hardware candidate frame. Let be the coordinates of the bottom left corner of the i-th hardware candidate box. Let be the coordinates of the top-right corner of the i-th hardware candidate frame; N is the number of hardware candidate frames; α is the preset bounding box coefficient, w i and h i Let be the width and height of the i-th hardware candidate box, respectively, and
[0015] In conjunction with the first aspect, in one possible implementation of the first aspect, spatial context information of the bounding box feature map is extracted through a pre-defined spatial context extraction model, including:
[0016] F s =Γ(T(M) s ))
[0017] in, This represents the spatial context information corresponding to the N hardware candidate boxes; T(~) represents the spatial context extraction process; Γ(~) represents the average pooling operation.
[0018] In conjunction with the first aspect, one possible implementation of the first aspect involves using a DoubleHead network for classification and regression based on spatial context information and region-of-interest feature maps, including:
[0019] After the feature map of the region of interest is input into the fully connected layer of the DoubleHead network, the classification head is used to classify the metal objects in the image of the metal objects to be detected.
[0020] The feature map of the region of interest is input into the convolutional layer of the DoubleHead network to obtain the feature vector of the region of interest. The feature vector of the region of interest and the spatial context information are then input into the spatial context memory model for processing. Finally, the regression head is used to regress the hardware in the image of the hardware to be detected.
[0021] In conjunction with the first aspect, in one possible implementation of the first aspect, the spatial context memory model is a gated recurrent unit; the feature vector of the region of interest and the spatial context information are input into the spatial context memory model for processing, including:
[0022] The temporalized results of the feature vector of the region of interest and the spatial context information corresponding to each hardware candidate box are input into the gated recurrent unit for processing.
[0023] In conjunction with the first aspect, one possible implementation of the first aspect, after using the DoubleHead network for classification and regression, also includes:
[0024] The classification and regression results are post-processed using the Soft-NMS model to remove redundant detection boxes.
[0025] In conjunction with the first aspect, in one possible implementation of the first aspect, the formula for post-processing the classification and regression results using the Soft-NMS model is as follows:
[0026]
[0027] Where V represents the detection box with the highest current classification confidence, b i For the detection box to be penalized, o i This represents the classification confidence of the detection box to be penalized, and σ is the standard deviation of the Gaussian function.
[0028] A second aspect of this invention provides a multi-fitting testing device for power transmission lines based on spatial knowledge fusion, comprising:
[0029] The acquisition module is used to acquire the captured image of the hardware to be inspected and input the image of the hardware to be inspected into a preset feature extraction network to extract basic features;
[0030] The determination module is used to input basic features into the RPN network to obtain the region of interest;
[0031] The bounding box setting module is used to set the bounding box for the region of interest, and project the region of interest and the bounding box onto the corresponding feature level and perform ROIAlign to obtain the region of interest feature map and the bounding box feature map.
[0032] The detection module is used to extract spatial context information of the spatial bounding box feature map through a preset spatial context extraction model. Based on the spatial context information and the region of interest feature map, the DoubleHead network is used for classification and regression to obtain the detection results of the hardware in the image to be detected.
[0033] A third aspect of the present invention provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the steps of the method as described in the first aspect or any implementation thereof.
[0034] A fourth aspect of the present invention provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the steps of the method as described in the first aspect or any implementation thereof.
[0035] The beneficial effects of the embodiments of the present invention compared with the prior art are as follows:
[0036] This invention considers that the implicit spatial knowledge between hardware fittings mainly exists in other hardware fittings around the target fitting and the connecting structures with these hardware fittings. Therefore, by setting a spatial bounding box for the region of interest and designing a spatial context extraction model, the spatial context information of the target fitting is extracted, achieving the effect of the model automatically mining implicit spatial knowledge. Then, based on the spatial context information and the feature map of the region of interest, a DoubleHead network is used for classification and regression to obtain the detection results of the hardware fittings in the image to be detected, thereby achieving accurate detection of hardware fittings on power transmission lines. Attached Figure Description
[0037] To more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0038] Figure 1 This is a schematic diagram illustrating the implementation process of the multi-fitting detection method for transmission lines based on spatial knowledge fusion provided in this embodiment of the invention.
[0039] Figure 2This is a schematic diagram of the overall detection process provided in an embodiment of the present invention;
[0040] Figure 3 This is a schematic diagram of the spatial frame provided in an embodiment of the present invention;
[0041] Figure 4 This is a schematic diagram of the GRU processing procedure provided in an embodiment of the present invention;
[0042] Figure 5 This is a schematic diagram of the hardware testing results provided in an embodiment of the present invention;
[0043] Figure 6 This is a schematic diagram of the structure of the transmission line multi-fitting detection device based on spatial knowledge fusion provided in an embodiment of the present invention;
[0044] Figure 7 This is a schematic diagram of the structure of the electronic device provided in an embodiment of the present invention. Detailed Implementation
[0045] In the following description, specific details such as particular system architectures and techniques are set forth for illustrative purposes and not for limitation, in order to provide a thorough understanding of the embodiments of the invention. However, those skilled in the art will understand that the invention can be implemented in other embodiments without these specific details. In other instances, detailed descriptions of well-known systems, apparatuses, circuits, and methods are omitted so as not to obscure the description of the invention with unnecessary detail.
[0046] To illustrate the technical solution described in this invention, specific embodiments are described below.
[0047] Traditional metal fitting inspection methods primarily rely on manually designed features, such as the shape, color, and texture of the fittings, to identify them or their defects. However, due to the limitations of manually designed features, these methods are generally only applicable to a specific type of fitting or its defects, lack scalability, have low accuracy, are susceptible to complex environments, and lack practical application scenarios.
[0048] Deep learning-based detection methods can autonomously learn the feature extraction process of images through neural networks. The resulting deep image features are more suitable for the expected detection task, exhibiting better anti-interference capabilities and higher recognition accuracy. Therefore, more and more scholars are beginning to study deep learning-based hardware detection methods. However, current hardware detection methods mostly employ image enhancement techniques or adjust the corresponding model based on the characteristics of the hardware itself to improve the model's feature extraction capabilities, thereby increasing detection accuracy. However, these methods rely solely on the hardware features in the suggestion box for individual target detection, rarely considering the rich contextual information between hardware components and ignoring the rich electrical domain expertise contained in the standardized combination and connection rules of hardware components. Therefore, utilizing the rich contextual information between hardware components and electrical domain expertise to achieve higher-precision detection results has significant research value and important practical significance.
[0049] In view of this, embodiments of the present invention provide a method for detecting multiple fittings in transmission lines based on spatial knowledge fusion, please refer to [the following text is also mentioned]. Figure 1 and Figure 2 As shown, the method includes:
[0050] Step S101: Acquire the captured image of the hardware to be inspected, and input the image of the hardware to be inspected into a preset feature extraction network to extract basic features.
[0051] In this embodiment, images of the hardware to be inspected on the transmission line can be captured by a drone. Each image of the hardware to be inspected may contain multiple hardware targets, and the types of hardware include pre-stretched suspension clamps, bag-type suspension clamps, compression tension clamps, wedge-type tension clamps, hanging plates, U-shaped hanging rings, connecting plates, parallel groove clamps, vibration dampers, spacers, equalizing rings, shielding rings, counterweights, adjusting plates, etc.
[0052] A feature extraction network can be constructed using a ResNet-101 backbone network and an FPN structure to extract basic features from the input image. Specifically, the input image undergoes feature extraction at different levels via ResNet-101, and the FPN performs feature fusion at these different levels to ensure that the model can handle targets of different scales within the image during detection, resulting in a model that can perform feature extraction at various levels. Figure 2 The basic features shown are P2 to P6.
[0053] Step S102: Input the basic features into the RPN network to obtain the region of interest.
[0054] In this embodiment, features P2 to P6 in the feature extraction network can be received through the RPN (Region Proposal Network). Different anchor box sizes and aspect ratios are set, and the anchor boxes are slid across the features of each layer to generate a large number of candidate boxes. Then, the candidate boxes are filtered and sampled to obtain a quantitative region of interest R.
[0055] Step S103: Set the spatial bounding box for the region of interest, and project the region of interest and the spatial bounding box onto the corresponding feature level using ROIAlign to obtain the feature map of the region of interest and the feature map of the spatial bounding box.
[0056] To obtain the spatial context information of the hardware candidate frames, it is first necessary to define their spatial frames. One possible implementation is to define the spatial frames for the region of interest, which can be detailed as follows: determine the position of the hardware candidate frames based on the boundaries of the region of interest; and calculate the position of the spatial frame corresponding to each hardware candidate frame using the first formula, based on the position of the hardware candidate frames.
[0057] Specifically, such as Figure 3 As shown, the region of interest obtained through the RPN network, i.e., the hardware candidate box, is first represented by the coordinates of the top left corner and the bottom right corner. Where N represents the number of hardware candidate frames. Due to the combinatorial nature of hardware, the connection structure and spatial relationships between hardware targets within a candidate frame are specifically manifested in the four spatial orientations (up, down, left, and right) of the candidate frame. This constitutes the unique spatial context information for each hardware target. Therefore, each hardware candidate frame corresponds to a spatial frame... The specific settings are as follows:
[0058]
[0059] in, This refers to the position of the space frame; This indicates the position of each hardware candidate frame. Let be the coordinates of the bottom left corner of the i-th hardware candidate box. Let be the coordinates of the top-right corner of the i-th hardware candidate frame; N is the number of hardware candidate frames; α is the preset bounding box coefficient, w i and h i Let be the width and height of the i-th hardware candidate box, respectively, and
[0060] The advantage of defining spatial frames in this way is that different spatial frames can take into account information from different directions, and there is no overlap between spatial frames, thus avoiding the repeated extraction of spatial context information.
[0061] It should be noted that when setting the spatial bounding box for hardware candidate boxes located at the edge of the image, the spatial bounding box may exceed the image range. Since there is no spatial context information within the spatial bounding box that exceeds the boundary, the portion of the spatial bounding box that exceeds the boundary needs to be clipped.
[0062] By projecting the region of interest (ROI) and its bounding box onto the corresponding feature level and applying ROIAlign, a fixed-size ROI feature map M can be obtained. r With spatial bounding box feature map M s Unlike ROIPooling, ROIAlign eliminates the quantization operation and calculates the image values at pixels with floating-point coordinates based on bilinear interpolation, thus solving the region mismatch problem caused by the two quantization operations in ROIPooling.
[0063] Step S104: Extract spatial context information of the spatial bounding box feature map using a preset spatial context extraction model. Based on the spatial context information and the region of interest feature map, use the DoubleHead network for classification and regression to obtain the detection result of the hardware in the hardware image to be detected.
[0064] To leverage the rich contextual information of hardware within the spatial frame and achieve the goal of autonomously uncovering implicit spatial knowledge to aid in detection, it is necessary to extract spatial contextual information from the spatial frame feature map. Due to the characteristics of the spatial frame design, the spatial frame feature map M... s It contains rich spatial contextual information. However, the spatial bounding feature map M... s The spatial bounding box is directly projected onto its corresponding feature level, and the fixed-size feature map obtained through ROIAlign is difficult to describe deeper spatial context information (connection rules and spatial structure, etc.). Furthermore, due to the presence of a lot of redundant information, it cannot be directly used to assist the model's inference and detection. Therefore, the process of extracting deeper spatial context information is as follows:
[0065] F s =Γ(T(M) s ))
[0066] in, Let T(~) represent the spatial context information corresponding to the N hardware candidate boxes. T(~) represents the spatial context extraction process, which mainly involves adaptively learning the deep extraction process of spatial context information through stacked convolutional layers. Γ(~) represents the average pooling operation, which can reduce redundant representations of spatial context information and achieve the integration of spatial context information.
[0067] Finally, using the obtained spatial context information and region of interest feature map, the DoubleHead detection head is used to achieve both classification and regression functions, thereby obtaining the type and location of the hardware.
[0068] As can be seen, this embodiment of the invention considers that the implicit spatial knowledge between fittings mainly exists in other fittings around the target fitting and the connection structures with these fittings. Therefore, by setting a spatial bounding box for the region of interest and designing a spatial context extraction model, the spatial context information of the target fitting is extracted, achieving the effect of the model automatically mining implicit spatial knowledge. Then, based on the spatial context information and the feature map of the region of interest, a DoubleHead network is used for classification and regression to obtain the detection results of the fittings in the image of the fitting to be detected, thereby achieving accurate detection of fittings on the transmission line.
[0069] As one possible implementation, step S104 utilizes the DoubleHead network for classification and regression based on spatial context information and the feature map of the region of interest, including:
[0070] After the feature map of the region of interest is input into the fully connected layer of the DoubleHead network, the classification head is used to classify the metal objects in the image of the metal objects to be detected.
[0071] The feature map of the region of interest is input into the convolutional layer of the DoubleHead network to obtain the feature vector of the region of interest. The feature vector of the region of interest and the spatial context information are then input into the spatial context memory model for processing. Finally, the regression head is used to regress the hardware in the image of the hardware to be detected.
[0072] See Figure 2 As shown, the DoubleHead network includes a classification head and a regression head. Region of Interest Feature Map M r After further processing by the fully connected layer, the data enters the classification head for target category identification. The regression head first processes the spatial bounding box feature map M... s The spatial context extraction module extracts and refines the spatial context information of the hardware to obtain spatial context information F. s The corresponding region of interest feature map M r After processing by the convolutional layer, the feature vector F of the region of interest is obtained. r F r The spatial context memory module effectively memorizes the spatial context information of the region of interest in four directions (up, down, left, and right) to assist the model's regression detection head in achieving accurate target localization. Because metal objects exhibit specific characteristics in different images, the spatial context memory module is designed to filter and memorize spatial context information, eliminating the influence of erroneous implicit spatial knowledge that may exist due to factors such as shooting angle, thus assisting the model in achieving accurate inference and detection.
[0073] As one possible implementation, the spatial context memory model is a gated loop unit.
[0074] Accordingly, the feature vector of the region of interest and spatial context information are input into the spatial context memory model for processing, including:
[0075] The temporalized results of the feature vector of the region of interest and the spatial context information corresponding to each hardware candidate box are input into the gated recurrent unit for processing.
[0076] Since the spatial frame is divided into four spatial directions (up, down, left, and right), for each hardware candidate frame, the spatial context information of the four directions obtained through depth extraction is as follows: However, due to the drone's aerial photography angle, for a few hardware targets in the aerial images, the spatial context information of some of their orientations is not absolutely related to these targets, and may even contain some errors, easily misleading the model's reasoning and detection process using implicit spatial knowledge. Therefore, a module with filtering and memory functions is needed to accurately select the feature vector of the region of interest by filtering the spatial context information from four orientations. It can selectively remember valid spatial context information and forget invalid spatial context information, guiding the model to use implicit spatial knowledge to achieve correct reasoning and detection.
[0077] To achieve this functionality, this embodiment proposes a spatial context memory module. This module mainly consists of GRU gated cyclic units, which achieve excellent long-term memory capabilities due to their dual-gating mechanism, and whose update gate design enables them to forget useless information.
[0078] Figure 4 This is a structural diagram of GRU, where `weights` represents weights, `Add` represents matrix addition, `Hadamard product` represents matrix multiplication, `sigmoid` is the activation function, `Tanh` is the Tanh function, and `Concate` represents concatenation. The spatial context memory module uses the features of each hardware candidate box as the initial hidden state and the temporalized result of the spatial context information corresponding to the hardware candidate box as input, thereby achieving the memorization of spatial context information. The reset gate r... t The specific calculation method is as follows:
[0079] r t =σ(W r ·[h t-1 ,x t ])
[0080] Where [~,~] represents vector concatenation, h t-1 Let x represent the hidden state at time t-1. tσ(~) represents the spatial context information input at time t, and σ(~) represents the Sigmoid activation function.
[0081] Update Gate u t The specific calculation method is as follows:
[0082] u t =σ(W u ·[h t-1 ,x t ])
[0083] Among them W u To update the learnable parameters of the gate.
[0084] Reset door r t The specific usage is to construct candidate hidden states.
[0085]
[0086] Where W represents the learnable parameter. is the Tanh activation function, and * denotes the Hadamard product.
[0087] It can be seen that, because x t The spatial context information for the new input, and the hidden state h t-1 It contains historical spatial information recorded up to time t-1, and r is calculated using the Hadamard product. t *h t-1 r t It can effectively control the inflow of historical spatial context information into candidate hidden states. The amount. Therefore, reset gate r t This determines the combination of new spatial context information and historical spatial context information.
[0088] Update Gate u t This is then used to calculate the final hidden state h. t :
[0089]
[0090] Where (1-u t )*h t-1 This represents the selective forgetting of redundant information in the hidden state of the previous time step. This represents the selective memorization of valid information from the candidate hidden states at the current moment.
[0091] Therefore, update gate u t It can selectively remember and forget the new input space context information and the historical space context information contained in the hidden state. Therefore, through the reset gate r in GRU tand update door u t Through collaboration, the spatial context memory module can effectively combine the features of each hardware candidate box with its corresponding spatial context information in different orientations. It remembers the correct spatial context information that is beneficial to the model detection and forgets the incorrect spatial context information that is harmful to the model detection, thereby enabling the model to discover effective implicit spatial knowledge to assist the model's reasoning and detection.
[0092] In one possible implementation, after performing classification and regression using the DoubleHead network in step S104, the following steps are also included:
[0093] The classification and regression results are post-processed using the Soft-NMS model to remove redundant detection boxes.
[0094] To address the issue of dense occlusion in hardware, although methods that integrate implicit spatial knowledge can provide more accurate localization of hardware candidate boxes based on the spatial context information of the hardware's regularized structure, multiple hardware candidate boxes may correspond to the same hardware target. Therefore, NMS (non-maximum suppression) is needed to remove redundant detection results on the same hardware target.
[0095] The traditional NMS method, also known as Greedy-NMS, sorts all candidate bounding box detection results from highest to lowest classification confidence. It then selects the bounding box V with the highest classification confidence and greedily deletes detection results adjacent to it, i.e., detection results with an IOU (Intersection over Union) higher than a certain threshold. Finally, this process is repeated to obtain the final detection result. However, this method performs poorly in cases of densely occluded hardware objects because the Greedy-NMS threshold selection cannot balance two situations: when the Greedy-NMS IOU threshold is low, in areas with densely distributed hardware objects, Greedy-NMS may incorrectly suppress predictions with lower confidence among neighboring similar hardware objects; when the Greedy-NMS IOU threshold is high, the suppression effect on hardware object detection results is not significant, easily leading to duplicate detections.
[0096] Therefore, to address this issue, this embodiment introduces Soft-NMS for post-processing structure improvement. Unlike Greedy-NMS, Soft-NMS does not directly remove adjacent boxes with an IOU higher than a certain threshold with V, but instead attenuates their confidence. Since higher IOU overlap means the detection boxes are more likely to detect the same target, redundant detection boxes need to be suppressed. Therefore, the penalty for confidence attenuation should be positively correlated with IOU. Furthermore, if the penalty function for confidence attenuation is a continuous function, it can effectively avoid sudden changes in confidence ranking caused by abrupt penalties. Therefore, the specific formula for the post-processing method in this embodiment is as follows:
[0097]
[0098] Where V represents the box with the highest current classification confidence, b i For the box to be penalized, o i This represents the classification confidence of the bounding box to be penalized, where σ is a constant and is the standard deviation of the Gaussian function.
[0099] The formula shows that when the box V and b i When there is no overlap, no penalty is needed for classification confidence; however, as the bounding boxes V and b... i The greater the overlap, the greater the penalty to the classification confidence.
[0100] Overall, this embodiment proposes a method for detecting multiple hardware fittings in transmission lines based on the fusion of implicit spatial knowledge. Utilizing DoubleHead as the basic model framework, it proposes a spatial bounding box setting module and a spatial context extraction module to mine implicit spatial knowledge. Then, a spatial context memory module is designed to remember spatial context information beneficial to detection and forget erroneous spatial context information that may appear due to factors such as shooting angle. This allows the model to fully consider the specificity of the spatial information contained in hardware targets in different images, achieving the fusion of implicit spatial knowledge. Finally, Soft-NMS is introduced to improve the model's post-processing part to further alleviate the problems caused by dense occlusion of hardware fittings.
[0101] The main innovative points of this invention include:
[0102] (1) Since the implicit spatial knowledge between hardware mainly exists in other hardware around the hardware target and the connection structure with these hardware, a spatial frame setting module and a spatial context extraction module are designed to extract the spatial context information of the detection target, so as to achieve the effect of the model mining implicit spatial knowledge on its own.
[0103] (2) Since the target of the hardware in different pictures is specific, a spatial context memory module is designed to filter and memorize the spatial context information, eliminate the influence of erroneous implicit spatial knowledge due to shooting angle and other reasons, and assist the model to achieve accurate reasoning and detection.
[0104] (3) Due to the limitations of the NMS (Non-maximum suppression) post-processing structure in dense target detection, the Soft-NMS improved model post-processing structure is introduced. Different degrees of penalty are imposed based on the confidence of different detection results according to the cross-union comparison, so as to further alleviate the problems of missed detection or duplicate detection that may occur due to dense occlusion of hardware targets.
[0105] This embodiment uses a dataset of hardware targets captured by UAVs during power transmission line inspections for model training. The training set contains 1442 images, and the test set contains 348 images, totaling 1790 images and containing 25675 hardware targets. The final detection results can be found in the image. Figure 5 As shown.
[0106] It should be understood that the sequence number of each step in the above embodiments does not imply the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention.
[0107] See Figure 6 As shown, this embodiment of the invention provides a transmission line multi-fitting detection device 60 based on spatial knowledge fusion, comprising:
[0108] The acquisition module 61 is used to acquire the captured image of the hardware to be detected and input the image of the hardware to be detected into a preset feature extraction network to extract basic features.
[0109] The determination module 62 is used to input basic features into the RPN network to obtain the region of interest.
[0110] The spatial bounding box setting module 63 is used to set the spatial bounding box for the region of interest, and to project the region of interest and the spatial bounding box onto the corresponding feature level and perform ROIAlign to obtain the feature map of the region of interest and the feature map of the spatial bounding box.
[0111] The detection module 64 is used to extract the spatial context information of the spatial bounding box feature map through a preset spatial context extraction model, and to perform classification and regression using the DoubleHead network based on the spatial context information and the region of interest feature map to obtain the detection result of the hardware in the hardware image to be detected.
[0112] As one possible implementation, the space frame setting module 63 is specifically used for:
[0113] The position of the candidate frame for the fittings is determined based on the boundary of the region of interest. Based on the position of the candidate frame for the fittings, the position of the spatial frame corresponding to each candidate frame for the fittings is calculated using the first formula.
[0114] The first formula is:
[0115]
[0116] in, This refers to the position of the space frame; This indicates the position of each hardware candidate frame. Let be the coordinates of the bottom left corner of the i-th hardware candidate box. Let be the coordinates of the top-right corner of the i-th hardware candidate frame; N is the number of hardware candidate frames; α is the preset bounding box coefficient, w i and h i Let be the width and height of the i-th hardware candidate box, respectively, and
[0117] As one possible implementation, the detection module 64 is specifically used for:
[0118] F s =Γ(T(M) s ))
[0119] in, This represents the spatial context information corresponding to the N hardware candidate boxes; T(~) represents the spatial context extraction process; Γ(~) represents the average pooling operation.
[0120] As one possible implementation, the detection module 64 is specifically used for:
[0121] After the feature map of the region of interest is input into the fully connected layer of the DoubleHead network, the classification head is used to classify the metal objects in the image of the metal objects to be detected.
[0122] The feature map of the region of interest is input into the convolutional layer of the DoubleHead network to obtain the feature vector of the region of interest. The feature vector of the region of interest and the spatial context information are then input into the spatial context memory model for processing. Finally, the regression head is used to regress the hardware in the image of the hardware to be detected.
[0123] As one possible implementation, the spatial context memory model is a gated loop unit.
[0124] The detection module 64 is specifically used for:
[0125] The temporalized results of the feature vector of the region of interest and the spatial context information corresponding to each hardware candidate box are input into the gated recurrent unit for processing.
[0126] As one possible implementation, after using the DoubleHead network for classification and regression, the detection module 64 is specifically used for:
[0127] The classification and regression results are post-processed using the Soft-NMS model to remove redundant detection boxes.
[0128] As one possible implementation, the formula for post-processing classification and regression results using the Soft-NMS model is as follows:
[0129]
[0130] Where V represents the detection box with the highest current classification confidence, b i For the detection box to be penalized, o i This represents the classification confidence of the detection box to be penalized, and σ is the standard deviation of the Gaussian function.
[0131] Figure 7 This is a schematic diagram of an electronic device 70 provided according to an embodiment of the present invention. Figure 7 As shown, the electronic device 70 of this embodiment includes: a processor 71, a memory 72, and a computer program 73 stored in the memory 72 and executable on the processor 71, such as a transmission line multi-fitting detection program based on spatial knowledge fusion. When the processor 71 executes the computer program 73, it implements the steps in the various embodiments of the transmission line multi-fitting detection method based on spatial knowledge fusion described above, for example... Figure 1 The steps S101 to S104 are shown. Alternatively, when the processor 71 executes the computer program 73, it implements the functions of each module in the above-described device embodiments, for example... Figure 6 The functions of modules 61 to 64 are shown.
[0132] For example, computer program 73 may be divided into one or more modules / units, one or more of which are stored in memory 72 and executed by processor 71 to complete the present invention. One or more modules / units may be a series of computer program instruction segments capable of performing a specific function, which describe the execution process of computer program 73 in electronic device 70.
[0133] Electronic device 70 can be a desktop computer, laptop, handheld computer, cloud server, or other computing device. Electronic device 70 may include, but is not limited to, a processor 71 and a memory 72. Those skilled in the art will understand that... Figure 7 This is merely an example of electronic device 70 and does not constitute a limitation on electronic device 70. It may include more or fewer components than shown, or combine certain components, or different components. For example, electronic device 70 may also include input / output devices, network access devices, buses, etc.
[0134] The processor 71 may be a Central Processing Unit (CPU), or other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor may be a microprocessor or any conventional processor.
[0135] The memory 72 can be an internal storage unit of the electronic device 70, such as a hard disk or RAM of the electronic device 70. The memory 72 can also be an external storage device of the electronic device 70, such as a plug-in hard disk, Smart Media Card (SMC), Secure Digital (SD) card, or Flash Card equipped on the electronic device 70. Furthermore, the memory 72 can include both internal and external storage units of the electronic device 70. The memory 72 is used to store computer programs and other programs and data required by the electronic device 70. The memory 72 can also be used to temporarily store data that has been output or will be output.
[0136] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the above-described division of functional units and modules is merely an example. In practical applications, the above functions can be assigned to different functional units and modules as needed, that is, the internal structure of the device can be divided into different functional units or modules to complete all or part of the functions described above. The functional units and modules in the embodiments can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit. Furthermore, the specific names of the functional units and modules are only for easy differentiation and are not intended to limit the scope of protection of this application. The specific working process of the units and modules in the above system can be referred to the corresponding process in the foregoing method embodiments, and will not be repeated here.
[0137] In the above embodiments, the descriptions of each embodiment have different focuses. For parts that are not described in detail or recorded in a certain embodiment, please refer to the relevant descriptions of other embodiments.
[0138] Those skilled in the art will recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementations should not be considered beyond the scope of this invention.
[0139] In the embodiments provided by this invention, it should be understood that the disclosed devices / electronic devices and methods can be implemented in other ways. For example, the device / electronic device embodiments described above are merely illustrative. For instance, the division of modules or units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection between devices or units may be electrical, mechanical, or other forms.
[0140] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.
[0141] Furthermore, the functional units in the various embodiments of the present invention can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.
[0142] If integrated modules / units are implemented as software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, all or part of the processes in the methods of the above embodiments of the present invention can also be implemented by a computer program instructing related hardware. The computer program can be stored in a computer-readable storage medium, and when executed by a processor, it can implement the steps of the various method embodiments described above. The computer program includes computer program code, which can be in the form of source code, object code, executable files, or certain intermediate forms. The computer-readable medium can include: any entity or device capable of carrying computer program code, recording media, USB flash drives, portable hard drives, magnetic disks, optical disks, computer memory, read-only memory (ROM), random access memory (RAM), electrical carrier signals, telecommunication signals, and software distribution media, etc.
[0143] The above embodiments are only used to illustrate the technical solutions of the present invention, and are not intended to limit it. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention, and should all be included within the protection scope of the present invention.
Claims
1. A method for detecting multiple fittings in transmission lines based on spatial knowledge fusion, characterized in that, include: Acquire images of the hardware to be inspected and input them into a preset feature extraction network to extract basic features; The basic features are input into the RPN network to obtain the region of interest. A bounding box is defined for the region of interest, and the region of interest and the bounding box are projected onto the corresponding feature level and ROIAligned to obtain the feature map of the region of interest and the feature map of the bounding box. The spatial context information of the spatial bounding box feature map is extracted by a preset spatial context extraction model. Based on the spatial context information and the region of interest feature map, the DoubleHead network is used for classification and regression to obtain the detection result of the hardware in the image to be detected. Based on the spatial context information and the feature map of the region of interest, classification and regression are performed using the DoubleHead network, including: After the feature map of the region of interest is input into the fully connected layer of the DoubleHead network, the metal objects in the image of the metal objects to be detected are classified by the classification head. The region of interest feature map is input into the convolutional layer of the DoubleHead network to obtain the region of interest feature vector. The region of interest feature vector and spatial context information are then input into the spatial context memory model for processing. Finally, the regression head is used to regress the hardware in the hardware image to be detected. The spatial context memory model is a gated recurrent unit; the feature vector of the region of interest and spatial context information are input into the spatial context memory model for processing, including: The temporalized results of the feature vector of the region of interest and the spatial context information corresponding to each hardware candidate box are input into the gated loop unit for processing; The spatial context memory model uses the features of each hardware candidate box as the initial hidden state and the temporal result of the spatial context information corresponding to the hardware candidate box as the input. Through the coordination of the reset gate and the update gate in GRU, it achieves selective memorization and forgetting of the new input spatial context information and historical spatial context information contained in the hidden state. Define the bounding box for the region of interest, including: The position of the candidate hardware frame is determined based on the boundary of the region of interest. Based on the position of the candidate hardware frame, the position of the spatial frame corresponding to each candidate hardware frame is calculated using the first formula. The first formula is: in, This refers to the position of the space frame; This indicates the position of each hardware candidate frame. Let be the coordinates of the bottom left corner of the i-th hardware candidate box. Let be the coordinate of the top right corner of the i-th hardware candidate box; N is the number of hardware candidate boxes; The preset spatial frame coefficient, and The first The width and height of each hardware candidate frame, and ; The spatial context information of the bounding box feature map is extracted using a preset spatial context extraction model, including: in, express Spatial context information corresponding to each hardware candidate frame; This indicates the spatial context extraction process; This indicates the average pooling operation.
2. The method as described in claim 1, characterized in that, After using the DoubleHead network for classification and regression, the following steps are also included: The classification and regression results are post-processed using the Soft-NMS model to remove redundant detection boxes.
3. The method as described in claim 2, characterized in that, The formula for post-processing classification and regression results using the Soft-NMS model is as follows: in, This represents the detection box with the highest classification confidence. The detection box to be penalized. This represents the classification confidence of the detection box to be penalized. is the standard deviation of the Gaussian function.
4. A multi-fitting testing device for transmission lines based on spatial knowledge fusion, characterized in that, include: The acquisition module is used to acquire the captured image of the hardware to be inspected, and input the image of the hardware to be inspected into a preset feature extraction network to extract basic features; The determination module is used to input the basic features into the RPN network to obtain the region of interest; The bounding box setting module is used to set the bounding box for the region of interest, and project the region of interest and the bounding box onto the corresponding feature level and perform ROIAlign to obtain the region of interest feature map and the bounding box feature map. The detection module is used to extract the spatial context information of the feature map of the spatial bounding box through a preset spatial context extraction model, and to perform classification and regression using the DoubleHead network based on the spatial context information and the feature map of the region of interest to obtain the detection result of the hardware in the image of the hardware to be detected. Based on the spatial context information and the feature map of the region of interest, classification and regression are performed using the DoubleHead network, including: After the feature map of the region of interest is input into the fully connected layer of the DoubleHead network, the metal objects in the image of the metal objects to be detected are classified by the classification head. The region of interest feature map is input into the convolutional layer of the DoubleHead network to obtain the region of interest feature vector. The region of interest feature vector and spatial context information are then input into the spatial context memory model for processing. Finally, the regression head is used to regress the hardware in the hardware image to be detected. The spatial context memory model is a gated recurrent unit; the feature vector of the region of interest and spatial context information are input into the spatial context memory model for processing, including: The temporalized results of the feature vector of the region of interest and the spatial context information corresponding to each hardware candidate box are input into the gated loop unit for processing; The spatial context memory model uses the features of each hardware candidate box as the initial hidden state and the temporal result of the spatial context information corresponding to the hardware candidate box as the input. Through the coordination of the reset gate and the update gate in GRU, it achieves selective memorization and forgetting of the new input spatial context information and historical spatial context information contained in the hidden state. Define the bounding box for the region of interest, including: The position of the candidate hardware frame is determined based on the boundary of the region of interest. Based on the position of the candidate hardware frame, the position of the spatial frame corresponding to each candidate hardware frame is calculated using the first formula. The first formula is: in, This refers to the position of the space frame; This indicates the position of each hardware candidate frame. Let be the coordinates of the bottom left corner of the i-th hardware candidate box. Let be the coordinate of the top right corner of the i-th hardware candidate box; N is the number of hardware candidate boxes; The preset spatial frame coefficient, and The first The width and height of each hardware candidate frame, and ; The spatial context information of the bounding box feature map is extracted using a preset spatial context extraction model, including: in, express Spatial context information corresponding to each hardware candidate frame; This indicates the spatial context extraction process; This indicates the average pooling operation.
5. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the computer program, it implements the steps of the method as described in any one of claims 1 to 3.
6. A computer-readable storage medium storing a computer program, characterized in that, When the computer program is executed by a processor, it implements the steps of the method as described in any one of claims 1 to 3.