A method for constructing a power transmission tower intelligent recognition dataset based on multi-source remote sensing images

By constructing a smart identification dataset for power transmission towers based on multi-source remote sensing images, the problem of data shortage in remote sensing image detection is solved, the generalization ability of the algorithm and the specialization ability of the model are improved, and the problem of insufficient dataset in remote sensing tower detection is resolved.

CN122244582APending Publication Date: 2026-06-19POWERCHINA HUADONG ENG CORP LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
POWERCHINA HUADONG ENG CORP LTD
Filing Date
2026-02-05
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

In the current technology, there is a lack of high-quality datasets for remote sensing imagery in the field of power transmission tower detection, which leads to the degradation of algorithm performance across domains. The lack of professional datasets has become an obstacle to the development of remote sensing tower detection technology.

Method used

A dataset for intelligent identification of power transmission towers based on multi-source remote sensing images is constructed. This involves acquiring remote sensing images within a specified range, stitching and segmenting them to a specified resolution, classifying and labeling them according to the tower's shape, and then using data augmentation techniques to expand the dataset.

🎯Benefits of technology

A high-quality remote sensing dataset of power transmission towers was constructed, which improved the generalization ability of the algorithm and the specialization ability of the model, alleviated the class imbalance problem, and enhanced the model's sensitivity to local features.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122244582A_ABST
    Figure CN122244582A_ABST
Patent Text Reader

Abstract

This invention provides a method for constructing a smart identification dataset for power transmission towers based on multi-source remote sensing imagery, comprising the following steps: S1, acquiring Tianditu tile data of remote sensing images within a specified range based on latitude and longitude coordinates; S2, stitching the scattered tiles into a complete remote sensing image according to the Tianditu tile numbering rules, and re-segmenting the remote sensing image at a resolution of 640×640 to obtain a new set of segmented remote sensing images at a specified resolution; S3, labeling the segmented remote sensing image files according to the shape classification of power transmission towers, and recording the tower type and location data in a text file according to YOLO format to complete the original dataset containing images and text files; S4, selecting images with different backgrounds and time phases, and using data augmentation techniques to expand the original dataset. The dataset constructed by this invention, covering multiple backgrounds and time phases, can guarantee the accuracy of target detection algorithms and improve the generalization ability of the algorithms.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of target detection technology, specifically relating to a method for constructing a smart identification dataset for power transmission towers based on multi-source remote sensing images. Background Technology

[0002] With breakthroughs in deep learning technology, target detection technology has demonstrated enormous application potential in the field of power transmission tower inspection. In recent years, research on intelligent power transmission tower inspection based on UAVs and ground inspections has formed a relatively mature technical system, but significant research gaps remain in remote sensing image analysis. Due to the advantages of satellite / aerial remote sensing in large-scale, periodic monitoring, this technology is of great significance for power transmission tower inspection and equipment maintenance. The implementation of China's first satellite remote sensing technology standard for power transmission lines (T / CES 296-2024) in 2024 marks an acceleration in the standardization of this field. Against this backdrop, developing tower detection algorithms based on remote sensing imagery is of significant strategic importance for realizing power transmission tower inspection and equipment maintenance.

[0003] Currently, research on power transmission tower inspection exhibits a significant "data-algorithm" imbalance: at the data level, existing datasets, such as the PowerLine-Tower Dataset, contain only 3,200 UAV images, lacking remote sensing data; general remote sensing datasets like WHU-RS19 have tower samples accounting for less than 5%; at the algorithm level, the lack of specialized datasets leads to cross-domain performance degradation. For example, advanced models like TTPNet, published in 2023, achieved an mAP of 92.1% on a self-made test set, but this plummeted to 67.3% during cross-scenario validation. The lack of remote sensing datasets is becoming a major obstacle restricting the development of remote sensing tower inspection technology.

[0004] High-quality remote sensing tower datasets can guarantee the accuracy of target detection algorithms and improve their generalization ability. For example, data covering multiple time phases (dawn / dusk / seasons) and multiple resolutions (sub-meter to meter level) can significantly enhance the algorithm's environmental adaptability. Therefore, this invention proposes a method for constructing a smart identification dataset for power transmission towers based on multi-source remote sensing imagery, aiming to build a high-quality remote sensing dataset for power transmission towers and solve the current problem of a lack of such datasets. Summary of the Invention

[0005] The main objective of this invention is to provide a method for constructing a smart identification dataset for power transmission towers based on multi-source remote sensing imagery, addressing the aforementioned problems.

[0006] Therefore, the above-mentioned objective of the present invention is achieved through the following technical solution:

[0007] A method for constructing a smart identification dataset for power transmission towers based on multi-source remote sensing imagery includes the following steps:

[0008] S1. Obtain the Tianditu tile data of remote sensing images within a specified range based on latitude and longitude coordinates;

[0009] S2. According to the Tianditu tile numbering rules, the scattered tiles are stitched together into a complete remote sensing image. The remote sensing image is then re-segmented according to a resolution of 640×640 to obtain a new set of segmented remote sensing images with a specified resolution.

[0010] S3. According to the shape of the transmission tower, the transmission towers are divided into Y-shaped, T-shaped and A-shaped. The segmented remote sensing image files are labeled according to the tower classification. The tower type and location data are recorded in the text file according to the YOLO format to complete the original dataset containing images and text files.

[0011] S4. Select images with different backgrounds and time phases, and use data augmentation techniques to expand the original dataset.

[0012] While adopting the above technical solutions, the present invention may also adopt or combine the following technical solutions:

[0013] As a preferred technical solution of the present invention: Step S1 specifically involves: latitude and longitude can be converted into tile numbers in Tianditu (a Chinese map database), and the Tianditu tile naming format is x. y, its formula is as follows:

[0014]

[0015]

[0016] In the formula, x_tile is the x-number, y_tile is the y-number, z is the tile level, lon is the latitude, and lat is the longitude;

[0017] Tiles at the same level are named x_tile_y_tile. The upper and lower limits of the tile number are calculated, and all tile data within the specified level are obtained.

[0018] As a preferred technical solution of the present invention: In step S2, the segmentation uses a sliding segmentation method: a 640×640 resolution window is moved from the upper left corner of the complete remote sensing image from top to bottom and from left to right to the lower right corner of the image according to the step size. The image is segmented sequentially according to the window, and the repetition rate is set to 0.1, that is, the moving step size is 576. Finally, there may be residual parts at the lower boundary and right boundary of the image whose length does not meet 640. These parts are filled with black to complete the resolution to 640×640 using a filling strategy.

[0019] As a preferred technical solution of the present invention: In step S3, the YOLO information format is as follows: one line record is one tower sample. The first item in the line is the type of transmission tower, the second item is the center position of the transmission tower in the horizontal direction, the third item is the center position of the transmission tower in the vertical direction, the fourth item is the length of the transmission tower in the horizontal direction, and the fifth item is the length of the transmission tower in the vertical direction. The last four items of data must be normalized.

[0020] As a preferred technical solution of the present invention: in step S4, the data augmentation means include at least horizontal mirroring, vertical mirroring, clockwise rotation of 90 degrees, and adding noise.

[0021] As a preferred technical solution of the present invention: the marking process for horizontal mirroring is as follows: the rest of the data remains unchanged, and the second item x of the recorded data is calculated according to the following formula:

[0022]

[0023] In the formula, x represents the horizontal position of the transmission tower.

[0024] As a preferred technical solution of the present invention: the vertical mirror marking method specifically involves keeping the rest of the data unchanged, and calculating the third item y of the recorded data according to the following formula:

[0025]

[0026] In the formula, y represents the vertical position of the transmission tower.

[0027] As a preferred technical solution of the present invention: the marking processing method of rotating 90 degrees clockwise specifically involves keeping the first type of data unchanged, and calculating the second to fifth items of the recorded data (x, y, x_length, y_length) according to the following formula:

[0028]

[0029]

[0030]

[0031]

[0032] In the formula, x is the horizontal position of the transmission tower, y is the vertical position of the transmission tower, x_length is the horizontal length of the transmission tower, and y_length is the vertical length of the transmission tower.

[0033] Compared with the prior art, the present invention has the following beneficial effects:

[0034] 1) This invention uses actual remote sensing image sources as input and constructs a professional remote sensing image dataset for power transmission towers. Compared with the power transmission tower dataset taken by drones, it is more suitable for professional remote sensing detection scenarios. At the same time, by selecting different scenarios and time phases and using various data expansion methods, a high-volume, high-quality dataset can be constructed, which greatly helps to improve the generalization ability of the algorithm.

[0035] 2) Based on the external features of the towers, this invention classifies power transmission towers into finer categories, distinguishes the differences between categories, and forces the model to mine more discriminative features. This can significantly enhance the model's sensitivity to local features and help improve the model's professional capabilities. At the same time, by splitting the major categories into subcategories, sample resources can be allocated more precisely, alleviating the problem of class imbalance. Attached Figure Description

[0036] Figure 1 The flowchart illustrates the method for creating a remote sensing dataset of power transmission towers based on Tianditu provided in this invention.

[0037] Figure 2 Remote sensing images of the desert region.

[0038] Figure 3 This is a remote sensing image of a green area.

[0039] Figure 4 This is a schematic diagram of the sliding step segmentation method.

[0040] Figure 5 A schematic diagram of the boundary of the filling strategy.

[0041] Figure 6 This is a schematic diagram of a T-shaped power transmission tower.

[0042] Figure 7 This is a schematic diagram of type A and type Y transmission towers.

[0043] Figure 8a This is a comparison chart of the original dataset and the horizontally mirrored augmented dataset.

[0044] Figure 8b This is a comparison chart of the original dataset and the vertically mirrored expanded dataset.

[0045] Figure 8c This is a comparison chart of the original dataset and the expanded dataset after rotating 90 degrees clockwise.

[0046] Figure 8d This is a comparison chart of the original dataset and the augmented dataset after adding noise. Detailed Implementation

[0047] The present invention will now be described in further detail with reference to the accompanying drawings and specific embodiments.

[0048] like Figure 1 As shown, a method for constructing a smart identification dataset for power transmission towers based on multi-source remote sensing imagery includes the following steps:

[0049] S1. Based on latitude and longitude coordinates, obtain the Tianditu tile data of the remote sensing image within the specified range. The Tianditu tile data is specifically image data, namely a remote sensing image of size 256×256 within the specified range.

[0050] Two regions were selected: a desert area (32.237°N to 32.270°N, 91.656°E to 91.709°E) and a green area (34.900°N to 35.175°N, 116.402°E to 116.592°E). The upper and lower limits of the tile numbering were calculated using the following formula:

[0051]

[0052]

[0053] In the formula, x_tile is the x-number, y_tile is the y-number, z is the tile level, lon is the latitude, and lat is the longitude. The tile level z is selected as level 17. Based on the tile number range, all tiles within the range are downloaded from Tianditu.

[0054] S2. According to the Tianditu tile numbering rules, the scattered tiles are stitched together into a complete remote sensing image, such as... Figure 2-3 As shown, the remote sensing image is re-segmented according to a resolution of 640×640 to obtain a new set of segmented remote sensing images with a specified resolution.

[0055] The segmentation uses the sliding segmentation method, such as... Figure 4 As shown, a 640×640 resolution window is moved from the top left corner of the complete remote sensing image, from top to bottom and left to right, and then step-wise to the bottom right corner. The image is then segmented according to the window, with a repetition rate of 0.1 (i.e., a step size of 576) to reduce the probability of the power transmission tower being split into two separate images. Finally, there may be residual portions at the bottom and right edges of the image that do not meet the 640 resolution requirement. These portions are filled with black to complete the resolution to 640×640. Figure 5 As shown.

[0056] S3. According to the shape of the transmission tower, transmission towers are divided into Y-shaped, T-shaped, and A-shaped types, such as... Figure 6-7 As shown, the segmented remote sensing image files are labeled according to the tower classification, and the tower type and location data are recorded in a text file according to the YOLO format to complete the original dataset containing images and text files;

[0057] The labeling results are stored in YOLO format, which means that detailed information about the transmission towers in each image is recorded in a text file. The name of the text file must be consistent with the image file. Specifically, the information format is that one line records one transmission tower sample. The first item in a line is the type of transmission tower, the second item is the horizontal center position of the transmission tower, the third item is the vertical center position of the transmission tower, the fourth item is the horizontal length of the transmission tower, and the fifth item is the vertical length of the transmission tower. The last four items of data must be normalized.

[0058] S4. Select images with different backgrounds and time phases, and use data enhancement techniques such as horizontal mirroring, vertical mirroring, 90-degree clockwise rotation, and adding noise to expand the original dataset.

[0059] Different backgrounds were selected, primarily loess, desert, green land, and paddy fields; different time periods were also chosen, including spring and autumn, and day and night, ensuring dataset diversity. After determining the original dataset, the image files were processed using methods such as horizontal mirroring, vertical mirroring, 90-degree clockwise rotation, and noise addition to generate a new set of image files. Simultaneously, the labeled files were also processed to generate corresponding labeled text files, which maintained the same name as the corresponding image files. Figures 8a-8d It is a collection of the same image and its expanded version.

[0060] The specific method for handling horizontal mirroring is as follows: all other data remain unchanged, and the second item x of the recorded data is calculated according to the following formula:

[0061]

[0062] In the formula, x represents the horizontal position of the transmission tower.

[0063] The vertical mirroring method involves keeping all other data unchanged, and calculating the third item y according to the formula below:

[0064]

[0065] In the formula, y represents the vertical position of the transmission tower.

[0066] The specific processing method for marking a 90-degree clockwise rotation is as follows: the first type of data remains unchanged, and the second to fifth items of the recorded data (x, y, x_length, y_length) are calculated according to the formula below:

[0067]

[0068]

[0069]

[0070]

[0071] In the formula, x is the horizontal position of the transmission tower, y is the vertical position of the transmission tower, x_length is the horizontal length of the transmission tower, and y_length is the vertical length of the transmission tower.

[0072] Adding noise does not change the marker positions, so no changes are needed. The expanded data, together with the original data, forms a new transmission tower dataset, while ensuring the dataset size.

[0073] The test was conducted on two areas in the sample, totaling 590 towers. Without classification, the recall rate was 92.89%, the precision rate was 96.18%, and the F1 score was 94.51%. After the towers were classified into different types (Y / T / A), the recall rate was 93.39%, the precision rate was 96.83%, and the F1 score was 95.07%.

[0074] The technical solution of the present invention has been described in conjunction with the specific experimental procedures shown in the accompanying drawings. However, the scope of protection of the present invention is not limited to these specific embodiments. Without departing from the principles of the present invention, those skilled in the art can make equivalent changes or substitutions to the relevant technical features, and the technical solutions resulting from such changes or substitutions will all fall within the scope of protection of the present invention.

Claims

1. A method for constructing a smart identification dataset for power transmission towers based on multi-source remote sensing imagery, characterized in that, Includes the following steps: S1. Obtain the Tianditu tile data of remote sensing images within a specified range based on latitude and longitude coordinates; S2. According to the Tianditu tile numbering rules, the scattered tiles are stitched together into a complete remote sensing image. The remote sensing image is then re-segmented according to a resolution of 640×640 to obtain a new set of segmented remote sensing images with a specified resolution. S3. According to the shape of the transmission tower, the transmission towers are divided into Y-shaped, T-shaped and A-shaped. The segmented remote sensing image files are labeled according to the tower classification. The tower type and location data are recorded in the text file according to the YOLO format to complete the original dataset containing images and text files. S4. Select images with different backgrounds and time phases, and use data augmentation techniques to expand the original dataset.

2. The method according to claim 1, characterized in that: Step S1 specifically involves converting latitude and longitude into tile numbers in Tianditu (a Chinese online map platform). The naming format for Tianditu tiles is x. y, its formula is as follows: In the formula, x_tile is the x-number, y_tile is the y-number, z is the tile level, lon is the latitude, and lat is the longitude; Tiles at the same level are named x_tile_y_tile. The upper and lower limits of the tile number are calculated, and all tile data within the specified level are obtained.

3. The method according to claim 1, characterized in that: In step S2, the segmentation uses a sliding segmentation method: a 640×640 resolution window is moved from the top left corner of the complete remote sensing image from top to bottom and from left to right, and then to the bottom right corner of the image according to the step size. The image is segmented sequentially according to the window, with a repetition rate of 0.1, i.e., a movement step size of 576. Finally, there may be remaining parts at the bottom and right edges of the image whose length does not meet 640. These parts are filled with black to complete the resolution to 640×640 using a filling strategy.

4. The method according to claim 1, characterized in that: In step S3, the YOLO information format is as follows: one line record represents one tower sample. The first item in a line is the type of transmission tower, the second item is the center position of the transmission tower in the horizontal direction, the third item is the center position of the transmission tower in the vertical direction, the fourth item is the length of the transmission tower in the horizontal direction, and the fifth item is the length of the transmission tower in the vertical direction. The last four items of data must be normalized.

5. The method according to claim 1, characterized in that: In step S4, the data augmentation methods include at least horizontal mirroring, vertical mirroring, 90-degree clockwise rotation, and adding noise.

6. The method according to claim 5, characterized in that: The specific method for handling horizontal mirroring is as follows: all other data remain unchanged, and the second item x of the recorded data is calculated according to the following formula: In the formula, x represents the horizontal position of the transmission tower.

7. The method according to claim 5, characterized in that: The vertical mirroring method involves keeping all other data unchanged, and calculating the third item y according to the formula below: In the formula, y represents the vertical position of the transmission tower.

8. The method according to claim 5, characterized in that: The specific processing method for marking a 90-degree clockwise rotation is as follows: the first type of data remains unchanged, and the second to fifth items of the recorded data (x, y, x_length, y_length) are calculated according to the formula below: In the formula, x is the horizontal position of the transmission tower, y is the vertical position of the transmission tower, x_length is the horizontal length of the transmission tower, and y_length is the vertical length of the transmission tower.