A line engineering construction phase identification method, device, medium and equipment
By combining the YOLOv26 classification model and the BiRefNet background separation model, the problems of recognition accuracy and efficiency in the construction phase of line engineering were solved, achieving text-free and interference-resistant recognition of the construction phase, and improving the stability and automation of recognition.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- GUANGDONG ELECTRIC POWER SCI RES INST ENERGY TECH CO LTD
- Filing Date
- 2026-05-07
- Publication Date
- 2026-06-19
AI Technical Summary
Existing technologies cannot accurately and efficiently identify the construction phase of line projects, especially due to their strong reliance on electronic archive text data and the weak anti-interference ability of image recognition schemes, resulting in insufficient recognition accuracy and low efficiency.
An image recognition method based on the YOLOv26 classification model is adopted, combined with the BiRefNet background separation model for pixel-level foreground and background segmentation. Through coarse classification, background interference removal and fine classification, efficient and accurate identification during the construction phase is achieved.
It achieves efficient and accurate construction phase identification without relying on text data, improves the stability and anti-interference ability of identification, and ensures automated and intelligent management and control of the construction phase.
Smart Images

Figure CN122244685A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of identification, and more particularly to a method, apparatus, medium, and equipment for identifying the construction phase of a power line project. Background Technology
[0002] As a core infrastructure for power transmission, power line projects are characterized by cross-regional, long-term, and multi-stage construction. Different stages, such as foundation construction, tower erection, and line stringing, present varying safety risks. For example, foundation construction is prone to pit collapses, while line stringing poses risks like falls from heights and conductor breakage. Accurately identifying the construction stage at the site is crucial for precise safety management and automatic matching of protective measures. Furthermore, the progress control of power line projects heavily relies on the results of construction stage identification. Traditional progress management, which relies on manual data entry, generally suffers from information lag, data distortion, and low statistical efficiency, making it impossible to promptly grasp the actual progress of the project. Accumulated schedule deviations can easily lead to project delays. Therefore, automated and precise construction stage identification technology has become a core requirement for the intelligent construction and management of power line projects.
[0003] Currently, the identification technology for the construction phase of railway line projects is mainly divided into two types of implementation schemes. One type is the text-image fusion recognition scheme, which relies on both the text data of the project's electronic archives and the image data of the construction site for recognition. It uses an improved BERT-CRF model to process text features and an improved Mask R-CNN model to process image features, and then fuses the two types of features to obtain the phase recognition result. The other type is the pure image recognition scheme, which mostly uses classification networks based on the YOLO series and only uses images of the construction site to complete the classification and determination of the construction phase.
[0004] The aforementioned existing technologies have significant limitations in practical applications. Text-image fusion recognition schemes rely too heavily on electronic archive text data. However, during railway line construction, electronic archives commonly suffer from untimely updates, inconsistent formats, and missing content, easily leading to the failure of text feature extraction and significantly reducing overall recognition accuracy and scheme stability. Pure image recognition schemes, on the other hand, have weak background interference suppression capabilities. In images of railway line construction sites, in addition to the core targets of the current construction phase, there is often a large amount of redundant background information such as existing towers, surrounding buildings, and natural vegetation. Existing schemes lack a dedicated background interference separation mechanism, making it impossible to accurately distinguish between core construction targets and interfering information, resulting in the masking of core features and insufficient accuracy in phase classification. Furthermore, traditional manual recognition methods are inefficient and have high error rates. Existing automated recognition technologies cannot achieve text-free, interference-resistant, and high-precision construction phase recognition. These shortcomings prevent existing technologies from accurately and efficiently identifying the construction phases of railway line projects. Summary of the Invention
[0005] This invention provides a method, apparatus, medium, and equipment for identifying the construction stages of power line projects, in order to solve the problem that existing technologies cannot accurately and efficiently identify the construction stages of power line projects.
[0006] Firstly, this application provides a method for identifying the construction phase of a power line project, including: Acquire image data of the construction site of the railway line project; The image data is input into a preset first YOLOv26 classification model so that the first YOLOv26 classification model outputs a coarse classification result for the construction stage. When the coarse classification result is the tower erection stage or the line erection stage, the image data is input into a preset BiRefNet background separation model trained based on historical tower erection samples and historical line erection samples to perform pixel-level foreground and background segmentation in order to obtain the target image after removing background interference. The target image is input into a preset second YOLOv26 classification model so that the second YOLOv26 classification model outputs a fine classification result; The first YOLOv26 classification model is trained based on historical construction site images, and the second YOLOv26 classification model is trained based on historical target images after removing background interference. The coarse classification results and the fine classification results are weighted and fused to obtain the identification results for the construction phase of the line project.
[0007] This application can complete the construction phase identification by acquiring only image data of the line construction site, without relying on the text data of the electronic project archives. This effectively avoids the recognition failure problems caused by the lag in text data updates, inconsistent formats, and missing content, significantly improving the adaptability and stability of the solution. By first inputting the image data into the first YOLOv26 classification model to output coarse classification results, the initial determination of the construction phase can be quickly completed, effectively improving the overall recognition efficiency. For images that are coarsely classified as tower erection or line stringing construction phases, a specially trained BiRefNet background separation model is used to perform pixel-level foreground and background segmentation, which can accurately remove background interference information such as existing towers and vegetation. It retains the core construction target features and avoids the core features being obscured by redundant backgrounds, thus solving the problem of poor anti-interference ability of existing pure image recognition schemes. Then, the target image after background removal is input into the second YOLOv26 classification model to obtain fine classification results, which can focus on the core target features to carry out refined classification, further enhancing the differentiation accuracy between tower erection and line erection stages. Finally, the coarse classification and fine classification results are weighted and fused to make a judgment, which can balance the discrimination advantages of global image features and core target features, and ultimately achieve efficient, accurate and stable automated identification of the line engineering construction stage. This application effectively solves the problem that the existing technology cannot accurately and efficiently identify the line engineering construction stage.
[0008] Furthermore, the step of inputting the image data into a preset first YOLOv26 classification model, so that the first YOLOv26 classification model outputs a coarse classification result for the construction stage, specifically involves: The image data is converted into a uniform format and resolution to obtain a standardized image; The standardized image is subjected to noise suppression and adaptive brightness / contrast adjustment to obtain a preprocessed image. The preprocessed image is input into the first YOLOv26 classification model, so that the first YOLOv26 classification model can perform multi-scale feature extraction on the preprocessed image through the built-in CSPDarknet-26 architecture to obtain a deep feature map. The deep feature map is input into the built-in feature aggregation layer for global average pooling to obtain a one-dimensional feature vector. The one-dimensional feature vector is input into the built-in classification head for category mapping, and the probability value of the foundation construction stage, tower erection stage, or line erection stage is output. The stage with the highest probability value is taken as the coarse classification result.
[0009] This application first converts image data to a unified format and resolution, and performs preprocessing operations such as noise suppression and adaptive adjustment of brightness and contrast. This eliminates image differences and interference caused by different acquisition devices and shooting environments, providing standardized input for subsequent model processing and effectively improving the robustness and accuracy of feature extraction. Then, the preprocessed image is input into the first YOLOv26 classification model. Relying on the built-in CSPDarknet-26 architecture, multi-scale deep feature extraction is achieved, which can comprehensively and accurately capture the core features of different construction stages of the line project and avoid missing key features. Subsequently, a feature aggregation layer performs global average pooling on the deep feature map, which can simplify the network structure, reduce computational redundancy, and effectively aggregate global semantic information of the whole image, avoiding the impact of local feature deviations on the classification results. Finally, the classification head completes the mapping of features to construction stage categories and outputs the corresponding probability values. The stage with the highest probability is used as the coarse classification result, which can quickly and stably complete the preliminary determination of construction stages. This not only ensures the recognition accuracy and efficiency of coarse classification, but also provides a reliable judgment basis for subsequent path separation and fine classification.
[0010] Furthermore, when the coarse classification result indicates the tower erection stage or the line erection stage, the image data is input into a preset BiRefNet background separation model trained based on historical tower erection and line erection samples for pixel-level foreground and background segmentation to obtain the target image after removing background interference. Specifically: The BiRefNet background separation model includes a localization module and a reconstruction module; The image data is input into the localization module of the BiRefNet background separation model, so that the localization module extracts multi-stage multi-scale feature maps of the image data through the SwinTransformer backbone network, and performs context feature fusion on the multi-stage multi-scale feature maps through the ASPP module to output compressed features containing global semantic information. The compressed features are input into the reconstruction module of the BiRefNet background separation model, so that the reconstruction module processes the compressed features using BiRefBlock as the basic unit and generates reconstruction features for the corresponding stage. Then, the reconstruction features are upsampled and reconstructed using multiple BiRefBlocks. During the upsampling reconstruction process, the BiRefNet background separation model uses inward reference to crop image data into image patches according to the feature map size of the decoding stage, and stacks and fuses the image patches with the reconstruction features of the corresponding stage to obtain high-resolution fused features; at the same time, it uses outward reference to extract the gradient map of the preset real label as a supervision signal, uses the reconstruction module to generate a gradient attention map, and uses the gradient attention map, combined with a preset morphological dilation mask to filter the background noise gradient to obtain edge enhancement features; The high-resolution fusion feature is integrated with the edge enhancement feature to obtain the optimized reconstruction feature; the optimized reconstruction feature is then reconstructed through progressive upsampling to generate a binary segmentation map with the same resolution as the input image data. The binary segmentation image is superimposed on the image data to shield the existing towers and background interference areas. For the tower erection stage, the towers to be erected and the hoisting equipment are retained. For the line construction stage, the area where the conductors connect to the towers is retained, thus obtaining the target image after removing background interference.
[0011] This application employs a BiRefNet background separation model trained on historical tower erection and overhead line construction samples for pixel-level foreground and background segmentation. First, the localization module's SwinTransformer backbone network extracts multi-stage, multi-scale feature maps, and the ASPP module performs multi-context feature fusion to output compressed features containing global semantic information. This accurately captures the global semantics and multi-scale details of core targets in construction scenarios, avoiding the omission of key features. Then, the reconstruction module processes the compressed features using BiRefBlocks as basic units to generate reconstructed features. Simultaneously, during the upsampling reconstruction process, inward referencing is used to crop image data into corresponding-sized image blocks, which are then stacked and fused with the reconstructed features to obtain high-resolution fused features. This effectively solves the problem of fine-grained detail loss caused by traditional image scaling. Simultaneously, outward referencing is used to obtain the true label gradient map. To generate gradient attention maps for monitoring signals, a mask optimized by morphological dilation is used to filter background noise gradients and obtain edge enhancement features. This significantly improves the segmentation accuracy of fine-structured regions such as conductors and tower edges, effectively suppressing background interference. Subsequently, high-resolution fusion features and edge enhancement features are integrated to obtain optimized reconstruction features. After progressive upsampling, a binary segmentation map with the same resolution as the input is generated. The segmentation map is then superimposed on the original image to mask redundant background areas such as existing towers and vegetation, retaining only core construction targets such as towers to be erected, hoisting equipment, conductors, and tower connection areas. Finally, a target image without background interference is obtained. This not only greatly improves the segmentation accuracy of foreground and background in complex construction scenarios but also provides clean and focused feature inputs for subsequent fine classification in the construction stage. It fundamentally solves the technical problem that existing pure image recognition schemes are easily misclassified due to background interference.
[0012] Furthermore, the step of inputting the target image into a preset second YOLOv26 classification model, so that the second YOLOv26 classification model outputs a fine classification result, specifically involves: The target image is converted to a uniform format and resolution to obtain a standardized target image; The standardized target image is subjected to noise suppression and adaptive brightness and contrast adjustment to obtain a preprocessed target image. The preprocessed target image is input into the second YOLOv26 classification model, so that the second YOLOv26 classification model performs multi-scale feature extraction on the preprocessed target image through the CSPDarknet-26 architecture to obtain a deep target feature map. The deep target feature map is input into the built-in feature aggregation layer for global average pooling to obtain a one-dimensional target feature vector. The one-dimensional target feature vector is input into the built-in classification head for category mapping, and the probability value of the tower erection stage or the line erection stage is output. The stage with the highest probability value is taken as the sub-classification result.
[0013] This application preprocesses the target image after removing background interference by sequentially performing unified format resolution conversion, noise suppression, and adaptive brightness and contrast adjustment. This further eliminates image interference and standardizes input criteria, providing a high-quality feature extraction foundation for the fine classification model. After inputting the preprocessed target image into the second YOLOv26 classification model, multi-scale feature extraction is performed based on the CSPDarknet-26 architecture. This accurately focuses on the core features of the tower to be erected, the hoisting equipment, or the area where the conductor connects to the tower, completely avoiding the interference of redundant background information on feature recognition and effectively enhancing the feature differentiation between the tower erection stage and the line erection stage. Then, a feature aggregation layer performs global average pooling on the deep target feature map, which can simplify the feature dimensions, reduce the amount of computation, and comprehensively aggregate the global semantic information of the core target, avoiding the influence of local feature deviations on classification. Finally, the classification head completes the category mapping and outputs the corresponding probability value. The stage with the highest probability is determined as the fine classification result, which can significantly improve the classification accuracy and stability of the tower erection stage and the line erection stage, providing a reliable and accurate fine classification basis for the subsequent weighted fusion of coarse and fine classification results.
[0014] Furthermore, the weighted fusion of the coarse and fine classification results to obtain the identification results for the line engineering construction stage is specifically as follows: When the coarse classification result is the basic construction stage, the basic construction stage will be output as the identification result of the line engineering construction stage. When the coarse classification result is the tower erection stage or the line stringing stage, obtain the first confidence score corresponding to the coarse classification result and the second confidence score corresponding to the fine classification result; Multiply the first confidence score and the second confidence score by a preset weighting coefficient to obtain the weighted first confidence score and the weighted second confidence score; The weighted first confidence score is summed with the weighted second confidence score to obtain the final confidence score. The final confidence scores of the tower erection stage and the line stringing stage are compared, and the stage with the higher score is output as the identification result of the line engineering construction stage.
[0015] This application employs a weighted fusion approach to determine the construction phases of a power line project. First, the coarse classification results for the basic construction phase are directly output, simplifying the identification process and eliminating subsequent background separation and fine classification steps, significantly improving the identification efficiency and response speed of the basic construction phase. For results coarsely classified as either tower erection or line stringing, the first confidence score corresponding to the coarse classification and the second confidence score corresponding to the fine classification are obtained, weighted by preset weighting coefficients, and then summed to obtain the final confidence score. This approach balances the global image feature discrimination advantages of the coarse classification with the core target feature discrimination advantages of the fine classification, ensuring both comprehensiveness and accuracy in identification, and effectively avoiding misjudgments caused by the limited features of a single classification result. Then, by comparing the final confidence scores of the tower erection and line stringing phases, the phase with the higher score is output as the final identification result, further enhancing the differentiation accuracy between the two easily confused construction phases. Ultimately, this achieves stable, accurate, and reliable output of identification results for all construction phases of the power line project.
[0016] Secondly, this application provides a device for identifying the construction stage of a power line project. The device includes: The acquisition module is used to acquire image data of the construction site of the line project; The coarse classification module is used to input the image data into a preset first YOLOv26 classification model so that the first YOLOv26 classification model outputs a coarse classification result for the construction stage. The separation module is used to input the image data into a preset BiRefNet background separation model trained based on historical tower construction samples and historical line construction samples when the coarse classification result is the tower erection stage or the line erection stage, to perform pixel-level foreground and background segmentation, so as to obtain the target image after removing background interference. The fine classification module is used to input the target image into a preset second YOLOv26 classification model so that the second YOLOv26 classification model outputs fine classification results. The first YOLOv26 classification model is trained based on historical construction site images, and the second YOLOv26 classification model is trained based on historical target images after removing background interference. The identification module is used to perform weighted fusion of coarse and fine classification results to obtain the identification results for the construction phase of the line project.
[0017] This application constructs a complete line engineering construction stage identification device consisting of an acquisition module, a coarse classification module, a separation module, a fine classification module, and an identification module. It achieves end-to-end automated identification from image acquisition to stage determination. Compared with traditional manual identification or single automated solutions, it has significant technical effects and progress.
[0018] First, the acquisition module is responsible for acquiring image data from the construction site, providing a basic data source for the entire recognition process. This ensures that the recognition process relies entirely on image information, independent of the electronic engineering archive text, thus solving the recognition failure problem caused by delayed or missing text data updates in existing technologies and improving the universality and stability of the solution. Second, the coarse classification module uses a preset first YOLOv26 classification model to perform a coarse judgment of the construction stage, quickly distinguishing between the three stages of foundation, tower erection, and line stringing. This not only improves the overall processing efficiency but also provides a reliable preliminary judgment basis for subsequent refined processing, enabling rapid location of tasks to be processed. Next, based on the judgment results of the tower erection or line stringing stage, the separation module calls a BiRefNet background separation model trained on dedicated samples to perform pixel-level foreground and background segmentation of the image. This accurately removes background interference such as existing towers and vegetation, retaining only the core construction target area. This effectively solves the technical problem of redundant backgrounds in complex construction sites obscuring core features, clearing away interference for subsequent refined classification. Subsequently, the fine-classification module inputs the separated target image back into the second YOLOv26 classification model. Utilizing the model's CSPDarknet-26 architecture, it focuses on core target features to perform refined classification of the easily confused construction stages of tower erection and line stringing. This further improves the accuracy of distinguishing key construction stages and ensures the depth and accuracy of the classification results. Finally, the recognition module performs a weighted fusion judgment of the coarse and fine classification results. This comprehensively utilizes the dual discriminative advantages of global image features and core target features, balancing the comprehensiveness and accuracy of recognition. It effectively avoids the risk of misjudgment that may arise from a single classification dimension, ultimately outputting stable and reliable identification results for the construction stages of the power line project. In summary, this application, through the collaborative cooperation of various modules and a progressive processing logic, fundamentally improves the efficiency, accuracy, and anti-interference capability of power line project construction stage identification, achieving automated, intelligent, and precise control of the entire construction stage. It has significant practical value and promising industrial application prospects.
[0019] Thirdly, this application provides a computer-readable storage medium comprising a stored computer program, wherein, when the computer program is executed, it controls the device containing the computer-readable storage medium to perform the described method for identifying the construction stage of a power line project. Its beneficial effects are the same as those of the method for identifying the construction stage of a power line project provided in the first aspect of this application.
[0020] Fourthly, this application provides a terminal device including a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, wherein the processor executes the computer program to implement any of the line engineering construction stage identification methods described in the first aspect. Attached Figure Description
[0021] Figure 1 : A schematic flowchart of an embodiment of the method for identifying the construction stage of a line project provided in this application; Figure 2 : A schematic diagram of the structure of an embodiment of the model classification result provided in this application; Figure 3 : A schematic diagram of the structure of one embodiment of classification result 1 provided in this application; Figure 4 : A schematic diagram of the structure of one embodiment of classification result 2 provided in this application; Figure 5 : A schematic diagram of an embodiment of the line engineering construction stage identification device provided in this application. Detailed Implementation
[0022] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0023] Example 1 Please refer to Figure 1 In order to solve the problem that existing technologies cannot accurately and efficiently identify the construction stage of a power line project, this invention provides a method for identifying the construction stage of a power line project, including steps S01-S05.
[0024] S01: Acquire image data of the construction site of the line project.
[0025] In a preferred embodiment of this invention, the acquisition of image data from the construction site of the railway line project specifically involves: In this embodiment, image data of the line construction site is acquired through high-definition cameras and drone aerial photography equipment deployed on-site. The acquired images are visible light images, and the acquisition process must meet strict shooting environment requirements: priority is given to shooting during sunny days with sufficient natural light, avoiding direct sunlight (such as midday backlight), low-light scenes, and nighttime shooting is prohibited; avoid severe weather such as heavy rain, heavy fog, and strong sandstorms. If encountering scenes with slight dust or overcast skies, it is necessary to ensure that the image has no obvious noise and the outline of the core target is clear; at the same time, avoid obstruction of the core construction area by trees, buildings, and debris to ensure the quality of image acquisition. Different acquisition angles are used for different construction scenarios: for foundation construction scenarios, a top-down angle is preferred to clearly show the shape and size of the foundation pit; for tower erection scenarios, a side-view angle is preferred to fully present the tower structure; for stringing construction scenarios, a combination of top-down and side-view angles is used to capture the distribution and sag characteristics of the conductors, ensuring that the acquired images can fully cover the core construction areas such as the foundation pit, the towers to be erected, the hoisting equipment, the conductors, and the tower connection area, without missing key construction areas or image offset. The final acquired image data must meet strict quality requirements: no blurring, no ghosting, no motion blur; sharp edges on core targets; and clearly identifiable detailed features (such as foundation pits and guide wires). Color images must be captured, with accurate color reproduction and no significant color cast; image formats must be mainstream such as JPG and PNG to avoid image quality loss caused by special compression formats. The image resolution must meet the preset standard for subsequent model processing, i.e., 1280×1280 pixels, consistent with the input size for subsequent model training and processing, and free from overexposure, underexposure, or excessive noise that could affect feature extraction. This image acquisition method relies solely on visual data from the construction site, without requiring supplementary data such as engineering documents or construction logs. It is adaptable to data acquisition needs under different construction environments and different acquisition devices, providing a stable, complete, and high-quality raw data foundation for subsequent image preprocessing, coarse classification during the construction phase, background separation, and fine classification.
[0026] S02: Input the image data into a preset first YOLOv26 classification model so that the first YOLOv26 classification model outputs a coarse classification result for the construction stage.
[0027] In a preferred embodiment of this invention, the step of inputting the image data into a preset first YOLOv26 classification model, so that the first YOLOv26 classification model outputs a coarse classification result for the construction stage, specifically involves: In this embodiment, the preset first YOLOv26 classification model is trained and constructed through the following steps to ensure that it has accurate coarse classification capabilities for construction phases: Model training dataset construction: A large amount of historical image data from power line construction sites was collected, covering typical scenes from three stages: foundation construction, tower erection, and line stringing. The dataset includes 1625 images from the foundation construction stage, 1500 images from the tower erection stage, and 2260 images from the line stringing stage. The image data incorporates feature variations under different acquisition devices, lighting conditions, and shooting angles to improve the dataset's generalization ability. All sample images were manually labeled with three categories: foundation construction, tower erection, and line stringing, forming a labeled dataset containing both image data and corresponding labels. This labeled dataset was then divided into training and test sets in an 8:2 ratio. To further improve the model's robustness, multi-dimensional dataset augmentation operations were performed on the training set. Geometric transformation: The image is randomly flipped horizontally with a probability of 0.5, randomly flipped vertically with a probability of 0.5, randomly rotated within the range of -15° to 15° with a probability of 0.3, and randomly scaled within the range of 0.7 to 1.3 times with a probability of 0.6, to simulate the sample features under different shooting angles and shooting distances. Color perturbation: Adjusts image brightness within ±15% with a probability of 0.4, adjusts image contrast within ±20% with a probability of 0.4, adjusts image saturation within ±15% with a probability of 0.3, and converts color images to grayscale images with a probability of 0.1, suppressing interference from changes in lighting and ambient light color. Noise addition: Gaussian noise with a variance of 0.005 is added to the image with a probability of 0.2, and salt and pepper noise with a density of 0.003 is added with a probability of 0.1 to simulate image noise in a construction scene; Mosaic enhancement: Four images from different stages are randomly selected in a single training batch with a probability of 0.8 for mosaic stitching. Boundary smoothing is used during stitching to ensure the integrity of the core target. Cropping Enhancement: The core target region of the image is randomly cropped with a probability of 0.5, and the cropping ratio is controlled between 0.6 and 1.0, focusing on specific features of the focusing stage.
[0028] Model initialization and parameter settings: The first YOLOv26 classification network adopts a simple architecture of "backbone feature extraction - global feature aggregation - lightweight classification head", which is suitable for classification needs in multiple scenarios and balances accuracy and speed. Backbone network: The CSPDarknet-26 architecture is adopted, which is built on the basis of C3k2 modules. The C3k2 modules reduce the amount of computation by "3×3 depthwise convolution + 1×1 point convolution", integrate residual connections to avoid gradient vanishing, and embed PSABlock pyramid spatial attention module to enhance local features and suppress background noise through multi-scale spatial attention weighting. The top layer of the backbone network embeds an optimized SPPF-Nano module, which flexibly adjusts the number of pooling and combines grouping calculation strategy to capture global semantic features while reducing computational overhead. Channel attention mechanism is retained only in the top feature map, which works with SPPF-Nano module to balance memory usage and feature representation effect.
[0029] The CSPDarknet-26 is a lightweight backbone feature extraction network used in the YOLOv26 series models. It is an improved version of the CSP (Cross Stage Partial) Net and Darknet architectures, specifically designed for real-time object detection and classification tasks. Its core advantage lies in significantly reducing computational complexity and improving inference speed while maintaining feature extraction accuracy, making it suitable for feature extraction in complex scenarios such as railway construction images. Its overall architecture adopts a hierarchical design of "input layer - multi-stage feature extraction - global feature aggregation." Core modules include the C3k2 basic module, the PSABlock pyramid spatial attention module, and an optimized SPPF-Nano module, and it enhances feature representation capabilities through a cross-stage feature fusion strategy. Specifically, the C3k2 module, as the basic building block of the network, uses a combination of "3×3 depthwise convolution + 1×1 pointwise convolution" to reduce computational cost. It embeds residual connections internally to avoid the gradient vanishing problem during deep network training. At the same time, it integrates the PSABlock module to adaptively enhance local key features (such as towers and conductors in the construction area) and suppress background noise interference through a multi-scale spatial attention weighting mechanism. The top layer of the backbone network embeds an optimized version of the SPPF-Nano module, which can flexibly adjust the number of pooling and combine with grouping calculation strategies to efficiently capture global semantic features while reducing computational overhead, ensuring effective representation of the overall scene at different construction stages. The channel attention mechanism is retained only in the top feature map and works in conjunction with the SPPF-Nano module to balance model memory usage and feature representation effect. Furthermore, this architecture employs a cross-stage feature segmentation and fusion strategy, dividing the input feature map into two parts for separate processing before fusion. This approach preserves rich gradient information while reducing redundant computation, enabling the model to maintain efficient inference even with a high-resolution input of 1280×1280. It is particularly suitable for deep feature extraction of multi-scale targets (such as foundation pits, towers, and conductors) in construction images of railway engineering projects, providing a robust feature foundation for subsequent classification tasks. The SPPF-Nano module is a lightweight, improved spatial pyramid pooling module deployed at the top layer of the CSPDarknet-26 backbone network. This module is a lightweight, simplified, and optimized version of traditional SPPF spatial pyramid pooling. It employs grouped computation and a flexible, adjustable multi-level pooling structure, enabling global aggregation and fusion of deep features at different receptive fields and scales. This effectively extracts global semantic information from multi-scale construction targets such as foundation pits, towers, and conductors. Simultaneously, it significantly reduces the computational overhead and memory usage of traditional SPPF modules, adapts to 1280×1280 high-resolution image input, and works in conjunction with the top-level channel attention mechanism to enhance global feature representation capabilities while achieving lightweight inference.
[0030] Feature aggregation layer: A global average pooling design is adopted to transform the deep feature map output by the backbone network into a one-dimensional feature vector, which simplifies the structure while aggregating the semantic information of the whole image and avoids local feature bias.
[0031] Classification Head: It adopts a lightweight design, containing only one fully connected layer, which maps one-dimensional feature vectors to the category space, removes redundant convolutional layers and activation functions, and optimizes the output results with the h-sigmoid activation function, reducing inference latency while improving judgment stability.
[0032] The model loss function adopts a weighted cross-entropy loss combined with the ProgLoss staged weighting strategy: the weighted cross-entropy loss assigns class weights to the three construction stages to enhance the boundary differentiation of similar stages; the ProgLoss strategy dynamically adjusts the loss weights according to the model training stage, taking into account the balance between basic feature learning, class differentiation enhancement and generalization ability.
[0033] The ProgLoss staged approach (progressive loss balancing strategy) is a dynamic loss weight adjustment mechanism adopted during the YOLO26 model training phase. Its core is to progressively adjust the weight distribution ratios of different loss components, such as classification loss, localization loss, and confidence loss, in stages according to the training progress. Following a "coarse-to-fine" learning logic, it adapts to the training requirements of mixed detection of small targets (such as foundation pits and conductor joints) and large targets (such as towers and hoisting equipment) in railway engineering construction images. This mechanism, through preset weight decay / enhancement curves, prioritizes strengthening basic feature learning (such as distinguishing between background and foreground) in the early stages of training, balances multi-scale target feature representation in the middle stages, and focuses on accurate localization of small targets and key areas in the later stages. This avoids overfitting the model to the dominant target category and prevents insufficient learning of features for rare categories or small targets, achieving stable convergence during training, improving detection accuracy, reducing later training instability, and enhancing the model's generalization ability.
[0034] The first YOLOv26 network pre-trained on the COCO dataset was selected as the base model, and the pre-trained weights were loaded to initialize the parameters. Core training hyperparameters were set as follows: input image size 1280×1280, pixel values normalized by dividing by 255, mean set to [0.485, 0.456, 0.406], variance set to [0.229, 0.224, 0.225]. The optimizer used was MuSGD with momentum 0.937 and weight decay coefficient 0.0005. A gradient clipping mechanism with maximum gradient norm=1.0 was set to avoid gradient explosion. The initial learning rate was set to 0.01, and the first 10 epochs were used for warm-up iterations. The warm-up learning rate is linearly increased from 0.001 to the initial learning rate. The decay strategy uses cosine annealing decay. The learning rate is multiplied by 0.1 and 0.01 in the 100th and 200th epochs, respectively, and the final learning rate is no less than 1e-6. The class weights of the weighted cross-entropy loss are uniformly set to 1.0, that is, the class weights of the three categories of foundation construction, tower construction, and line construction are all 1.0, to ensure that the training weights of each category are balanced. The ProgLoss stage weights are configured according to the iteration cycle: the weighted cross-entropy loss weight is 0.7 in the first 100 epochs, adjusted to 0.6 in the 101-200th epochs, and set to 0.5 in the 201-300th epochs.
[0035] Model training and convergence validation: The training set is input into the initial first YOLOv26 network for iterative training, with a total of 300 iterations. After each training round, the model's classification accuracy is evaluated using the test set. When the classification accuracy on the test set no longer improves for 10 consecutive rounds and the loss value stabilizes below a preset threshold (e.g., 0.01), the model is considered to have completed training and converged. The model weights at this point are saved, resulting in the final preset first YOLOv26 classification model. The model is validated using the test set to ensure that its coarse classification accuracy for unseen construction image data is no less than 95%.
[0036] After acquiring image data of the construction site of the railway line project, the process of inputting the image data into a preset first YOLOv26 classification model to output coarse classification results for the construction stage includes the following steps: Image preprocessing: First, the original image data is processed to unify the format and resolution, converting it into an RGB format image of 1280×1280 pixels; then, a Gaussian filter kernel (3×3 size) combined with median filtering is used to remove Gaussian noise and salt-and-pepper noise; finally, the brightness and contrast are adaptively adjusted through the histogram equalization algorithm to enhance the distinction between the core construction area and the background, resulting in the preprocessed image.
[0037] Multi-scale deep feature extraction: The preprocessed image is input into the CSPDarknet-26 backbone network to extract deep feature maps containing multi-scale information and output high-dimensional, robust feature representations.
[0038] Global semantic feature aggregation: Input the deep feature map into the feature aggregation layer, and convert it into a one-dimensional global semantic feature vector through global average pooling.
[0039] Construction Stage Coarse Classification and Result Output: A one-dimensional feature vector is input into the classification head. After linear transformation by a fully connected layer and normalization using the Softmax activation function, confidence scores for three construction stages are output. The category with the highest confidence score is selected as the coarse classification result. In practical applications, if the coarse classification result is "foundation construction stage," the result and its corresponding confidence score are directly pushed to the result output module. If it is "tower erection stage" or "line erection stage," the image data and coarse classification result are temporarily stored before proceeding to the subsequent background separation process.
[0040] This embodiment, through the aforementioned pre-trained first YOLOv26 classification model, combined with the entire process of image preprocessing, multi-scale feature extraction, global semantic aggregation, and refined classification judgment, can efficiently and accurately complete the coarse classification of construction stage image data of power line construction sites. It not only eliminates image interference caused by acquisition equipment and environmental factors, improving the robustness and accuracy of feature extraction, but also quickly completes the initial differentiation of the three major construction stages: foundation, tower erection, and line stringing. This provides a reliable preliminary judgment basis for subsequent background separation processing and refined classification for the tower erection or line stringing construction stages, ensuring the efficiency and accuracy of the overall construction stage identification process. The model classification results of this embodiment are as follows: Figure 2 As shown.
[0041] S03: When the coarse classification result is the tower erection stage or the line erection stage, the image data is input into a preset BiRefNet background separation model trained based on historical tower erection samples and historical line erection samples to perform pixel-level foreground and background segmentation in order to obtain the target image after removing background interference.
[0042] In a preferred embodiment of this example, when the coarse classification result indicates the tower erection stage or the power line erection stage, the image data is input into a preset BiRefNet background separation model trained based on historical tower erection and power line erection samples for pixel-level foreground and background segmentation to obtain the target image after removing background interference. Specifically: In this embodiment, the BiRefNet background separation model is a dedicated model for training and convergence in the scenarios of tower erection and line stringing construction in power line engineering. Its core advantage lies in achieving fine-grained feature extraction and high-precision target segmentation through a dual-branch module design and a bilateral reference mechanism. The overall architecture includes a core module layer, a bilateral reference mechanism, and a loss function layer. The specific training and segmentation process is as follows: Model training: Historical image data of tower erection and line stringing construction scenes were collected as training samples, including 800 images of the tower erection stage and 800 images of the line stringing stage. All samples were pixel-level finely annotated, covering scenes under different lighting, terrain and construction progress, and including background areas such as existing towers and natural vegetation as well as core construction targets. The annotated dataset was divided into training set and test set in an 8:2 ratio.
[0043] The BiRefNet model was initialized, loading weights pre-trained on a general high-resolution segmentation dataset; distributed training was performed using 8 GPUs with a batch size of 4, and the input image resolution was uniformly set to 1280×1280; the AdamW optimizer was selected, with an initial learning rate of 1×10⁻⁶. -4 A cosine annealing strategy is used for attenuation, with a weight attenuation coefficient of 5×10. -5 The training consists of 100 rounds, with the first 10 rounds serving as warm-up training. The model loss function uses a weighted combination of binary cross-entropy loss, intersection-union ratio (OCR) loss, and structural similarity loss. The sum of the weights for the three loss terms is always 1.0, and the weight allocation is optimized in real-time through gradient backpropagation: if the gradient descent rate of a certain loss term is less than 1 × 10⁻⁶, the loss is terminated. -5 If the weight of the loss term is increased by 0.05, the model is ensured to focus on the core learning objective. After each training round, the segmentation accuracy is evaluated using a test set. When the intersection-over-union (IoU) of the test set is not lower than 96% for 8 consecutive rounds and the loss value is stable below 0.02, the model is considered to have converged, and the weights are saved to obtain the preset BiRefNet background separation model.
[0044] Pixel-level foreground / background segmentation process: When the coarse classification result indicates the tower erection stage or the power line erection stage, the image data is input into a preset BiRefNet background separation model. This model has a built-in localization module and a reconstruction module, and the two modules achieve feature interaction through lateral connection. Localization module processing: Input image data is fed into the localization module, which extracts multi-scale feature maps (sizes of 1 / 4, 1 / 8, 1 / 16, etc. of the input image) in four stages through the SwinTransformer backbone network. After being unified by 1×1 convolution dimension and fused with the ASPP module (dilation rate 1, 6, 12, 18) for multi-context, the module completes category semantic learning through global average pooling and fully connected layers, and outputs 256-dimensional compressed features containing global semantic information.
[0045] Reconstruction module processing: Compressed features are input into the reconstruction module, which uses BiRefBlocks as basic units. Reconstruction is performed by progressively upsampling three cascaded BiRefBlocks. The output features of each layer are added to the laterally connected features to restore resolution. A bilateral reference mechanism is activated during the upsampling process. The BiRefBlock, the core feature fusion unit of the BiRefNet architecture, is specifically designed for high-resolution images. It achieves efficient complementary fusion of shallow details and deep semantic features through a dual-branch bidirectional reference mechanism. Its core employs an "encoder-decoder bidirectional collaborative" structure: the forward reference path directly feeds features from each layer of the encoder laterally into the corresponding decoder layer, while the reverse reference path is guided by the decoder for feature optimization. Combined with deformable convolution and a hierarchical receptive field strategy, it accurately captures global and local features of multi-scale construction targets such as towers and conductors. This module effectively balances the contextual information and detailed representation of high-resolution images, reduces feature loss, and significantly improves the model's robustness and inference efficiency in complex railway construction scenarios, providing a high-quality feature foundation for subsequent object detection and segmentation tasks.
[0046] Inward referencing: The original high-resolution image is adaptively cropped into image patches according to the feature map size of the decoding stage, and then stacked and fused with the reconstruction features of the corresponding stage (F_fuse1 = Concat (F_rec, F_crop)) to supplement lossless high-resolution information and avoid loss of details; Outward reference: Extract the real label gradient map (G = Sobel (B)) as a supervision signal, generate a gradient attention map, and then filter the background noise gradient by combining the morphological dilation optimized mask (M_opt = Dilate (M_raw, kernel)) to obtain the edge enhancement feature F_fuse2.
[0047] Feature integration and segmentation map generation: The high-resolution fusion feature F_fuse1 and the edge enhancement feature F_fuse2 are integrated according to the formula F_opt = λ×F_fuse1 + (1-λ)×F_fuse2 (λ=0.7). After the last BiRefBlock upsampling, a binary segmentation map with the same resolution as the input image is generated by 1×1 convolution and Sigmoid activation (pixel value 1 is the foreground core target, and 0 is the background interference).
[0048] Target image generation: The binary segmentation image is overlaid with the original image data, and element-wise multiplication is used to mask background interference areas (existing towers, vegetation, buildings, etc.) while retaining the core construction target area (the area where towers to be erected and hoisting equipment, and the connection area between conductors and towers). An enhanced image of the core target is output (preserving the color and texture information of the original image). This target image is free of redundant background interference, providing a feature input basis for subsequent fine classification to focus on the core target. The image after background segmentation in this embodiment is shown below. Figure 3 and Figure 4 As shown.
[0049] S04: Input the target image into a preset second YOLOv26 classification model so that the second YOLOv26 classification model outputs a fine classification result.
[0050] In a preferred embodiment of this invention, the step of inputting the target image into a preset second YOLOv26 classification model so that the second YOLOv26 classification model outputs a fine classification result is specifically as follows: In this embodiment, the target image is a core target enhancement image processed by the BiRefNet background separation model. The target image is input into a preset second YOLOv26 classification model to output fine classification results. The specific process is as follows: Target image preprocessing: The target image is converted into a 1280×1280 pixel RGB format image to maintain the same standard as the coarse classification stage; a 3×3 Gaussian filter combined with median filtering is used to remove residual slight noise; the histogram equalization algorithm is used to enhance the detail contrast of the core construction target area, highlighting the features of the towers to be erected and hoisting equipment in the tower erection stage, as well as the conductor routing and tower connection node features in the line stringing stage, to obtain the preprocessed target image.
[0051] Model training adaptation: The first YOLOv26 classification model and the second YOLOv26 classification model have the same network structure, and their training datasets, model parameters, and weights are independent of each other. The training dataset of the second YOLOv26 classification model includes 1,400 images from the tower erection stage and 2,086 images from the line erection stage, which are divided into training and testing sets in an 8:2 ratio. The training hyperparameter configuration is consistent with that of the first YOLOv26 classification model to ensure the consistency and accuracy of the model in recognizing the core target features.
[0052] Detailed classification reasoning process: The preprocessed target image is input into the second YOLOv26 classification model. The CSPDarknet-26 backbone network focuses on extracting core target features and outputs a deep target feature map. After being converted into a one-dimensional target feature vector by global average pooling, the input to the classification head is processed through a fully connected layer and a Softmax activation function, combined with an h-sigmoid activation function to optimize the output results. Only the confidence scores of the tower erection stage and the line erection stage are output. The category with the highest confidence score is selected as the sub-classification result, and this result and its corresponding confidence score are pushed to the result output module.
[0053] This detailed classification process focuses on the core target features after removing the background, effectively avoiding background interference and accurately distinguishing between the two easily confused construction stages of tower erection and line stringing, providing a reliable basis for subsequent weighted fusion judgment.
[0054] S05: The coarse classification results and the fine classification results are weighted and fused to obtain the identification results for the construction phase of the line project.
[0055] In a preferred embodiment of this invention, the step of weightedly fusing the coarse classification results and the fine classification results to obtain the identification results for the line engineering construction stage is as follows: In this embodiment, the weighted fusion judgment of the coarse classification and fine classification results is completed through the result output module, and the final identification result of the line engineering construction stage is output. The specific implementation process is as follows: Integration of logic and computational rules: The core logic of weighted fusion is clearly defined: the coarse classification result is based on the global features of the complete image, reflecting the macroscopic information of the overall construction scene; the fine classification result is based on the enhanced image of the core target, focusing on the features of the core construction area. The two complement each other to avoid the limitations of single classification. For the tower erection / line stringing construction stage, an equal-weight fusion strategy (coarse classification weight 0.5, fine classification weight 0.5) is adopted because: firstly, the coarse and fine classification modules are built on the same YOLOv26 network, with consistent model structure and training strategy, and equivalent confidence levels; secondly, the weight coefficient of 0.5 ensures that the final score is normalized within the [0,1] interval. The fusion calculation formula is: S final =0.5Scoarse +0.5S fine S final For the final confidence score, S coarse S represents the confidence score for the coarse classification stage (values [0,1]). fine The confidence score for the corresponding stage of the sub-category (values [0,1]).
[0056] Judgment rules: If the coarse classification result is based on the construction stage, this stage can be directly output as the final identification result without weighted fusion, simplifying the process and ensuring identification efficiency. If the coarse classification result is tower erection or line stringing, calculate the final confidence score for each of the two stages, and take the stage with the higher score as the final identification result; if the scores of the two categories are close, perform a secondary verification by combining the feature correlation between the coarse and fine classifications to ensure the reliability of the judgment.
[0057] Output and Interface Adaptation: The final identification result comprises two parts: first, a unique determination of the construction stage (foundation construction stage, tower erection stage, or line stringing stage); and second, the final confidence score for the corresponding stage. It also provides a standardized RESTful API interface, supporting HTTP / HTTPS protocols. The interface output data uses a standardized JSON format with clearly defined fields and semantics, exhibiting good compatibility and scalability. It can adapt to the integration requirements of intelligent management and control systems for power line projects, achieving seamless collaboration between the identification results and various management and control systems, and supporting engineering-level application.
[0058] This weighted fusion judgment method fully integrates the advantages of global features and core target features, reduces misjudgments caused by background interference and feature omissions, ensures that the final output of construction stage identification results is accurate and stable, improves the closed loop of the entire construction stage identification process, and provides a reliable basis for the control of construction progress and safety management of line engineering.
[0059] In summary, this application can complete the construction phase identification by acquiring only image data of the line construction site, without relying on the text data of the electronic project archives. This effectively avoids the identification failure problems caused by the lag in text data updates, inconsistent formats, and missing content, significantly improving the adaptability and stability of the solution. By first inputting the image data into the first YOLOv26 classification model to output coarse classification results, the preliminary determination of the construction phase can be quickly completed, effectively improving the overall recognition efficiency. For images that are coarsely classified as tower erection or line stringing construction phases, a specially trained BiRefNet background separation model is used to perform pixel-level foreground and background segmentation, which can accurately remove background interference signals such as existing towers and vegetation. This method preserves the core construction target features and avoids obscuring them with redundant backgrounds, thus overcoming the poor anti-interference capability of existing pure image recognition schemes. The target image after background removal is then input into a second YOLOv26 classification model to obtain fine classification results, allowing for focused and refined classification of core target features, further enhancing the accuracy in distinguishing between tower erection and line stringing stages. Finally, the coarse and fine classification results are weighted and fused for a balanced assessment, combining the discriminative advantages of global image features and core target features. This ultimately achieves efficient, accurate, and stable automated identification of the line engineering construction stage. This application effectively solves the problem that existing technologies cannot accurately and efficiently identify the line engineering construction stage.
[0060] Example 2 Please refer to Figure 4 This is a line engineering construction stage identification device provided in the embodiments of this application.
[0061] In this embodiment, the line engineering construction stage identification device includes an acquisition module 10, a coarse classification module 20, a separation module 30, a fine classification module 40, and an identification module 50.
[0062] Module 10 is used to acquire image data of the construction site of the line project; The coarse classification module 20 is used to input the image data into a preset first YOLOv26 classification model so that the first YOLOv26 classification model outputs the coarse classification result of the construction stage. The separation module 30 is used to input the image data into a preset BiRefNet background separation model trained based on historical tower construction samples and historical line construction samples when the coarse classification result is the tower erection stage or the line erection stage, to perform pixel-level foreground and background segmentation, so as to obtain the target image after removing background interference. The fine classification module 40 is used to input the target image into a preset second YOLOv26 classification model so that the second YOLOv26 classification model outputs fine classification results; The identification module 50 is used to perform weighted fusion judgment on the coarse classification results and the fine classification results to obtain the identification results of the line engineering construction stage.
[0063] For ease of description and brevity, the embodiments of the device of the present invention include all the implementation methods in the above embodiments of the method for identifying the construction stage of line engineering, and will not be repeated here.
[0064] Example 3 This application provides a computer-readable storage medium, which includes a stored computer program, wherein the computer program controls the device where the computer-readable storage medium is located to execute the aforementioned method for identifying the construction stage of a line project when it is executed. The method for identifying the construction phase of a power line project, if implemented as a software functional unit and used as an independent product, can be stored in a computer-readable storage medium. Based on this understanding, all or part of the processes in the above embodiments can also be implemented by a computer program instructing related hardware. The computer program can be stored in a computer-readable storage medium, and when executed by a processor, it can implement the steps of the various method embodiments described above. The computer program includes computer program code, which can be in the form of source code, object code, executable files, or certain intermediate forms. The computer-readable medium can include any entity or device capable of carrying the computer program code, a recording medium, a USB flash drive, a portable hard drive, a magnetic disk, an optical disk, a computer memory, a read-only memory (ROM), a random access memory (RAM), an electrical carrier signal, a telecommunication signal, and a software distribution medium, etc.
[0065] Example 4 This embodiment provides a terminal device, including a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor. When the processor executes the computer program, it implements any of the line engineering construction stage identification methods described in Embodiment 1.
[0066] The specific embodiments described above further illustrate the purpose, technical solution, and beneficial effects of the present invention. It should be understood that the above descriptions are merely specific embodiments of the present invention and are not intended to limit the scope of protection of the present invention. In particular, it should be noted that any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the scope of protection of the present invention for those skilled in the art.
Claims
1. A method for identifying a construction phase of a line project, characterized in that, include: Acquire image data of the construction site of the railway line project; The image data is input into a preset first YOLOv26 classification model so that the first YOLOv26 classification model outputs a coarse classification result for the construction stage. When the coarse classification result is the tower erection stage or the line erection stage, the image data is input into a preset BiRefNet background separation model trained based on historical tower erection samples and historical line erection samples to perform pixel-level foreground and background segmentation in order to obtain the target image after removing background interference. The target image is input into a preset second YOLOv26 classification model so that the second YOLOv26 classification model outputs a fine classification result; The first YOLOv26 classification model is trained based on historical construction site images, and the second YOLOv26 classification model is trained based on historical target images after removing background interference. The coarse classification results and the fine classification results are weighted and fused to obtain the identification results for the construction phase of the line project.
2. The method for identifying the construction stage of a railway line project according to claim 1, characterized in that, The step of inputting the image data into a preset first YOLOv26 classification model, so that the first YOLOv26 classification model outputs a coarse classification result for the construction stage, specifically involves: The image data is converted into a uniform format and resolution to obtain a standardized image; The standardized image is subjected to noise suppression and adaptive brightness / contrast adjustment to obtain a preprocessed image. The preprocessed image is input into the first YOLOv26 classification model, so that the first YOLOv26 classification model can perform multi-scale feature extraction on the preprocessed image through the built-in CSPDarknet-26 architecture to obtain a deep feature map. The deep feature map is input into the built-in feature aggregation layer for global average pooling to obtain a one-dimensional feature vector. The one-dimensional feature vector is input into the built-in classification head for category mapping, and the probability value of the foundation construction stage, tower erection stage, or line erection stage is output. The stage with the highest probability value is taken as the coarse classification result.
3. The method for identifying the construction stage of a railway line project according to claim 1, characterized in that, When the coarse classification result indicates the tower erection stage or the power line erection stage, the image data is input into a preset BiRefNet background separation model trained based on historical tower erection and power line erection samples for pixel-level foreground and background segmentation to obtain the target image after removing background interference. Specifically: The BiRefNet background separation model includes a localization module and a reconstruction module; The image data is input into the localization module of the BiRefNet background separation model, so that the localization module extracts multi-stage multi-scale feature maps of the image data through the SwinTransformer backbone network, and performs context feature fusion on the multi-stage multi-scale feature maps through the ASPP module to output compressed features containing global semantic information. The compressed features are input into the reconstruction module of the BiRefNet background separation model, so that the reconstruction module processes the compressed features using BiRefBlock as the basic unit and generates reconstruction features for the corresponding stage. Then, the reconstruction features are upsampled and reconstructed using multiple BiRefBlocks. During the upsampling reconstruction process, the BiRefNet background separation model uses inward reference to crop image data into image blocks according to the feature map size of the decoding stage, and stacks and fuses the image blocks with the reconstruction features of the corresponding stage to obtain high-resolution fused features; Simultaneously, the gradient map of the preset real label is extracted by outward reference as a supervision signal, and the gradient attention map is generated by the reconstruction module. Based on the gradient attention map, the background noise gradient is filtered by the preset morphological dilation mask to obtain the edge enhancement feature. The high-resolution fusion feature is integrated with the edge enhancement feature to obtain the optimized reconstruction feature; the optimized reconstruction feature is then reconstructed through progressive upsampling to generate a binary segmentation map with the same resolution as the input image data. The binary segmentation image is superimposed on the image data to shield the existing towers and background interference areas. For the tower erection stage, the towers to be erected and the hoisting equipment are retained. For the line construction stage, the area where the conductors connect to the towers is retained, thus obtaining the target image after removing background interference.
4. The method for identifying the construction stage of a railway line project according to claim 1, characterized in that, The step of inputting the target image into a preset second YOLOv26 classification model, so that the second YOLOv26 classification model outputs a fine classification result, specifically involves: The target image is converted to a uniform format and resolution to obtain a standardized target image; The standardized target image is subjected to noise suppression and adaptive brightness and contrast adjustment to obtain a preprocessed target image. The preprocessed target image is input into the second YOLOv26 classification model, so that the second YOLOv26 classification model performs multi-scale feature extraction on the preprocessed target image through the CSPDarknet-26 architecture to obtain a deep target feature map. The deep target feature map is input into the built-in feature aggregation layer for global average pooling to obtain a one-dimensional target feature vector. The one-dimensional target feature vector is input into the built-in classification head for category mapping, and the probability value of the tower erection stage or the line erection stage is output. The stage with the highest probability value is taken as the sub-classification result.
5. The method for identifying the construction stage of a railway line project according to claim 1, characterized in that, The weighted fusion of the coarse and fine classification results to obtain the identification results for the construction phase of the line project is as follows: When the coarse classification result is the basic construction stage, the basic construction stage will be output as the identification result of the line engineering construction stage. When the coarse classification result is the tower erection stage or the line stringing stage, obtain the first confidence score corresponding to the coarse classification result and the second confidence score corresponding to the fine classification result; Multiply the first confidence score and the second confidence score by a preset weighting coefficient to obtain the weighted first confidence score and the weighted second confidence score; The weighted first confidence score is summed with the weighted second confidence score to obtain the final confidence score. The final confidence scores of the tower erection stage and the line stringing stage are compared, and the stage with the higher score is output as the identification result of the line engineering construction stage.
6. A device for identifying the construction stage of a power line project, characterized in that, include: The acquisition module is used to acquire image data of the construction site of the line project; The coarse classification module is used to input the image data into a preset first YOLOv26 classification model so that the first YOLOv26 classification model outputs a coarse classification result for the construction stage. The separation module is used to input the image data into a preset BiRefNet background separation model trained based on historical tower construction samples and historical line construction samples when the coarse classification result is the tower erection stage or the line erection stage, to perform pixel-level foreground and background segmentation, so as to obtain the target image after removing background interference. The fine classification module is used to input the target image into a preset second YOLOv26 classification model so that the second YOLOv26 classification model outputs fine classification results. The first YOLOv26 classification model is trained based on historical construction site images, and the second YOLOv26 classification model is trained based on historical target images after removing background interference. The identification module is used to perform weighted fusion of coarse and fine classification results to obtain the identification results for the construction phase of the line project.
7. The line engineering construction stage identification device according to claim 6, characterized in that, The step of inputting the image data into a preset first YOLOv26 classification model, so that the first YOLOv26 classification model outputs a coarse classification result for the construction stage, specifically involves: The image data is converted into a uniform format and resolution to obtain a standardized image; The standardized image is subjected to noise suppression and adaptive brightness / contrast adjustment to obtain a preprocessed image. The preprocessed image is input into the first YOLOv26 classification model, so that the first YOLOv26 classification model can perform multi-scale feature extraction on the preprocessed image through the built-in CSPDarknet-26 architecture to obtain a deep feature map. The deep feature map is input into the built-in feature aggregation layer for global average pooling to obtain a one-dimensional feature vector. The one-dimensional feature vector is input into the built-in classification head for category mapping, and the probability value of the foundation construction stage, tower erection stage, or line erection stage is output. The stage with the highest probability value is taken as the coarse classification result.
8. The line engineering construction stage identification device according to claim 6, characterized in that, When the coarse classification result indicates the tower erection stage or the power line erection stage, the image data is input into a preset BiRefNet background separation model trained based on historical tower erection and power line erection samples for pixel-level foreground and background segmentation to obtain the target image after removing background interference. Specifically: The BiRefNet background separation model includes a localization module and a reconstruction module; The image data is input into the localization module of the BiRefNet background separation model, so that the localization module extracts multi-stage multi-scale feature maps of the image data through the SwinTransformer backbone network, and performs context feature fusion on the multi-stage multi-scale feature maps through the ASPP module to output compressed features containing global semantic information. The compressed features are input into the reconstruction module of the BiRefNet background separation model, so that the reconstruction module processes the compressed features using BiRefBlock as the basic unit and generates reconstruction features for the corresponding stage. Then, the reconstruction features are upsampled and reconstructed using multiple BiRefBlocks. During the upsampling reconstruction process, the BiRefNet background separation model uses inward reference to crop image data into image blocks according to the feature map size of the decoding stage, and stacks and fuses the image blocks with the reconstruction features of the corresponding stage to obtain high-resolution fused features; Simultaneously, the gradient map of the preset real label is extracted by outward reference as a supervision signal, and the gradient attention map is generated by the reconstruction module. Based on the gradient attention map, the background noise gradient is filtered by the preset morphological dilation mask to obtain the edge enhancement feature. The high-resolution fusion feature is integrated with the edge enhancement feature to obtain the optimized reconstruction feature; the optimized reconstruction feature is then reconstructed through progressive upsampling to generate a binary segmentation map with the same resolution as the input image data. The binary segmentation image is superimposed on the image data to shield the existing towers and background interference areas. For the tower erection stage, the towers to be erected and the hoisting equipment are retained. For the line construction stage, the area where the conductors connect to the towers is retained, thus obtaining the target image after removing background interference.
9. A computer-readable storage medium, characterized in that, The computer-readable storage medium includes a stored computer program, wherein, when the computer program is executed, it controls the device where the computer-readable storage medium is located to perform the line engineering construction stage identification method as described in any one of claims 1 to 5.
10. A terminal device, characterized in that, The method includes a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, wherein the processor executes the computer program to implement the line engineering construction stage identification method as described in any one of claims 1 to 5.