[0044] In order to clearly illustrate the technical features of the present solution, the present invention will be described in detail below through specific implementations and in conjunction with the accompanying drawings. In order to simplify the disclosure of the present invention, the components and settings of specific examples are described below. In addition, the present invention may repeat reference numbers and/or letters in different examples. This repetition is for the purpose of simplification and clarity, and does not in itself indicate the relationship between the various embodiments and/or settings discussed. It should be noted that the components illustrated in the drawings are not necessarily drawn to scale. The present invention omits descriptions of well-known components and processing techniques and processes to avoid unnecessarily limiting the present invention.
[0045] Such as figure 1 As shown, a method for intelligent automatic identification of transmission line components includes the following steps:
[0046] S1. The preprocessing steps of the inspection data image sample: use the original image of the drone inspection data as the image source, manually mark the position of the transmission line components such as the poles, insulators, equalizing rings, spacer rods, etc. in the original image. Add attribute tags to transmission line components to construct a transmission line component recognition training data set;
[0047] S2. Feature extraction step: using convolutional neural network and feature pyramid network to extract multi-level features of transmission line images on the preprocessed inspection data image samples;
[0048] S3. Steps of training target positioning regression network and classification network: take the extracted multi-level features of the transmission line image and the calibrated attribute label data as the training input data, calculate the position sensitive score map, and calculate the loss of the classification network and the target positioning regression network Value, using stochastic gradient descent method to optimize the parameters of the classification network and regression network, so as to achieve the optimal classification and positioning of the transmission line components in the training data;
[0049] S4. Transmission line component detection step: according to the training parameters obtained from the transmission line recognition training, the detection network is initialized, and the transmission line inspection data is imported in batches to realize the automatic positioning and classification of the components.
[0050] In order to increase the diversity of image data and obtain enough training data, a data expansion strategy is introduced to expand the capacity of the training data set when performing step S1. The expansion strategy design methods here mainly include: mirror mapping, translation, rotation, cropping, and scaling. Transform. Five data expansion strategies are randomly used in combination to expand the manually labeled samples to meet the requirements of the training data volume.
[0051] Such as figure 2 As shown, the feature extraction step specifically includes the following steps:
[0052] S201. Convolutional network processing steps: the pre-processed inspection data image samples are used as the training input data set, and the first 13 layers of the VGG-net16 convolutional network are used, mainly including the convolutional layer and the pooling layer, through optimization of each layer Parameters, realize the extraction of image features of transmission lines, and obtain 512-dimensional high-level semantic features.
[0053] S202. Feature pyramid network processing steps: such as image 3 As shown, the feature pyramid network connects top-to-bottom side features by connecting high-level features with low resolution and high semantic information and low-level features with high resolution and low semantic information, so that features at all scales are With rich semantic information, this operation is performed simultaneously in the convolutional network instead of building an image pyramid, which can avoid additional time-consuming and memory usage. Convolutional network is a bottom-up convolution calculation. By defining a pyramid level for each network stage, the output of the last layer of each network stage is selected as the reference set of the feature map, and then each level is top-down Up-sampling is performed on the reference set of, and then the feature is horizontally connected to the feature on the next layer of reference set, so that the positioning details of the bottom layer can be used.
[0054] Through the combination of the convolutional neural network and the feature pyramid network, an image feature map with multi-scale information can be extracted, and the image feature map is used as the input data for the automatic identification of power transmission line components in step S3.
[0055] Such as Figure 4 , 5 As shown, the steps of training the target positioning regression network and the classification network in step S3 specifically include the following steps:
[0056] S301. Step of constructing a position-sensitive score map: After the feature map is extracted by the combination of the convolutional network and the feature pyramid network, a convolutional score layer is added to extract each category k on the image 2 A position-sensitive score map, assuming that there are a total of C+1 categories (C target, 1 background), k is generated after convolution 2 (C+1) output layer of channels.
[0057] S302. The step of constructing a target positioning regression network: extracting candidate regions from the feature map by using a fixed ratio candidate frame extraction method, and the fixed ratios are respectively {1:2, 1:1,2:1}. In step S2, 5 types of feature pyramid levels are used, so a total of 15 types of region extraction frames are obtained. In order to achieve the training of the target positioning regression network, the loss function is defined as:
[0058]
[0059] Among them, i is the serial number of the area extraction frame, p i Is the target probability when the area is extracted, The probability of being a true label: if it is a positive sample, it is 1, and if it is a negative sample, it is 0. The positive and negative samples are determined by the IoU of the region extraction frame and the real frame. When IoU is greater than 0.7, it is a positive sample, when IoU is less than 0.3, it is a negative sample, and all values between 0.3-0.7 are ignored. t i Is the position coordinate of the predicted border, Is the position coordinate of the real frame, L cls Is the classification loss function, used to judge whether there is a target in the frame; L reg Is a regression function, fine-tuning the position and size of the border, N cls Is the size of the mini-batch during training, about 2000, N reg Is the number of positioning of the area extraction frame, λ is the balance parameter, set to 15.
[0060] In order to achieve the overall training of the network, the parameters of the convolution scoring layer, classification network and target positioning regression network are randomly initialized, and the loss function is defined as:
[0061]
[0062] Where c * Is the true label of the candidate area, [c *0] means if it is a real label, the parameter is set to 1, otherwise it is 0; Is the cross entropy loss function used for classification, L reg (t,t * ) Is the boundary regression loss function, t * Is the coordinate of the real label, t is the output regression coordinate, and the balance parameter λ is set to 1 here.
[0063] S303. Step of constructing a classification network: map the marked box of the region of interest to the convolutional score layer, and divide k*k divisions, each region obtains a feature map of C+1 dimensions, and the k *The corresponding score maps on k grids are extracted, and then k 2 A score map is used to vote on the candidate area to determine the target category;
[0064] For the candidate region of interest with a size of w*h, the size of each grid is (w/k)*(h/k), and the score map in the grid of the i-th row and j-th column is defined as follows:
[0065]
[0066] Where r c (i,j) is the response score of the (i,j)th grid for the c-th target, Z i,j,c Representative k 2 (C+1) one of the score graphs, (x 0 ,y 0 ) Represents the coordinates of the upper left corner of the candidate area, n represents the number of pixels in each grid, and θ is all the learnable parameters in the network;
[0067] For k 2 A score map is voted by the average score, and a C+1-dimensional vector is generated for each candidate region:
[0068] Finally calculate the softmax response of each target:
[0069] The feature extraction network, area extraction network, and classification network used for transmission line inspection image recognition are a sequential process, and end-to-end strategies are used for training. The network parameters are extracted by initializing the characteristics of the VGG-net16 network, and the weight parameters of the area extraction network and the classification network are initialized with Gaussian function, and the fine-tune strategy is used to optimize them to obtain the final transmission line inspection image recognition model.
[0070] Such as Image 6 As shown, the detection steps of transmission line components include the following steps:
[0071] S401. Initialize the network step: initialize the training network using the image recognition model obtained by optimization and training, including network structure definition, network basic parameter definition, weight and offset data filling;
[0072] S402. The step of extracting image feature maps: firstly, perform multi-layer convolution operation on the input image to extract the high-level semantic expression of the input image; secondly, perform multi-scale segmentation and upsampling on the extracted features; finally, combine the pyramid features with convolution The later features are merged to form the final feature map;
[0073] S403. Step of extracting candidate regions: extracting candidate frames using a region extraction network, and performing a non-maximum value suppression operation, and only the first 300 candidate frames with the highest scores are retained, so as to achieve rapid detection processing;
[0074] S404. Step of calculating the position-sensitive score map: perform a score map convolution operation on the candidate boxes, calculate the score of each candidate box, and obtain the average score of each candidate box to construct a position-sensitive score map;
[0075] S405. The step of determining the category of the candidate area and the marking frame: performing softmax classification on the sensitive score map to determine the final category; performing fitting optimization on the candidate frame to determine the final target marking position.
[0076] Although the specific embodiments of the present invention are described above with reference to the accompanying drawings, they do not limit the protection scope of the present invention. Those skilled in the art should understand that on the basis of the technical solutions of the present invention, those skilled in the art do not need to pay creative work Various modifications or variations that can be made are still within the protection scope of the present invention.