A method, device and medium for detecting surface defects of a rail of an unmanned trolley

By constructing feature maps through multi-path nonlinear transformation branches and attention mechanisms, the robustness and real-time performance issues of unmanned overhead crane rail surface defect detection in complex environments are solved, achieving efficient defect detection.

CN122199408APending Publication Date: 2026-06-12HANGZHOU HUAXIN MECHANICAL & ELECTRICAL ENGINEERING CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
HANGZHOU HUAXIN MECHANICAL & ELECTRICAL ENGINEERING CO LTD
Filing Date
2026-02-13
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Existing unmanned overhead crane rail surface defect detection technologies lack robustness in complex industrial environments, making it difficult to simultaneously achieve both real-time detection and high precision. Traditional methods are inefficient and highly susceptible to subjective factors.

Method used

A feature map is constructed using a multi-path nonlinear transformation branch. Combined with channel and spatial attention mechanisms, a phantom feature map is generated through convolution operations and nonlinear activation processing. Element-wise weighted operations are then performed to generate a comprehensive weight map for rail surface defect detection.

🎯Benefits of technology

It significantly improves the robustness and accuracy of surface defect detection on unmanned overhead crane rails, enabling real-time and efficient defect identification in complex environments.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122199408A_ABST
    Figure CN122199408A_ABST
Patent Text Reader

Abstract

The present application relates to unmanned trolley steel rail detection technical field, disclose a kind of unmanned trolley steel rail surface defect detection method, equipment and medium, method includes obtaining the convolution operation of unmanned trolley steel rail surface image, obtain intrinsic feature map input into multiple parallel nonlinear transformation branch, obtain corresponding sub-feature map, sub-feature map is spliced, obtain ghost feature map;Intrinsic feature map and ghost feature map are spliced in channel, obtain initial feature map;The channel attention vector of initial feature map is added to spatial attention map element by element, and activation operation is carried out, to obtain comprehensive weight map;Comprehensive weight map is used to the initial feature map element by element weighting operation, to obtain enhanced feature map;According to enhanced feature map generation candidate area, candidate area is classified and regression processing, and the output of rail surface defect detection result.The present application improves the robustness of unmanned trolley steel rail surface defect detection, while taking into account the real-time and high-precision of detection.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of unmanned overhead crane rail inspection technology, and more specifically, to a method, equipment and medium for detecting surface defects in unmanned overhead crane rails. Background Technology

[0002] Unmanned overhead cranes, as key equipment in intelligent manufacturing, are widely used in construction, ports, and docks. Their operational safety is highly dependent on the health of the rail surface. Because unmanned overhead cranes endure heavy loads and frequent start-stop friction at high altitudes for extended periods, the rail surface is prone to defects such as cracks, indentations, and peeling. If not addressed promptly, these defects can lead to decreased positioning accuracy and even derailment accidents. Traditional rail inspection relies primarily on periodic manual inspections. This method is not only inefficient and requires downtime, but is also heavily influenced by subjective factors such as the inspector's experience, fatigue, and ambient lighting, making it difficult to detect new defects in real time during inspection intervals and creating significant safety blind spots.

[0003] In recent years, deep learning-based object detection algorithms have gradually replaced traditional methods, mainly divided into single-stage algorithms represented by YOLO and two-stage algorithms represented by Faster R-CNN. Faster R-CNN, as a classic two-stage algorithm, achieves high-precision detection through a region proposal network, but its computational cost is high. To improve detection performance, existing technologies often employ image enhancement or introduce lightweight networks to address issues such as blur and shaky images. However, these existing algorithms still have significant shortcomings in practical industrial applications: on the one hand, facing complex and harsh industrial environments such as uneven lighting, camera shake, and oil contamination, the robustness of existing algorithms remains insufficient, easily leading to false positives or false negatives; on the other hand, existing algorithms struggle to simultaneously achieve both real-time detection and high accuracy, failing to meet the stringent requirements of online safety monitoring for unmanned overhead cranes. Summary of the Invention

[0004] To overcome the shortcomings of existing unmanned overhead crane rail surface defect detection technologies, which lack robustness and struggle to simultaneously achieve both real-time performance and high precision, this invention proposes the following technical solution:

[0005] Firstly, this invention proposes a method for detecting surface defects in the rails of unmanned overhead cranes, comprising: Acquire images of the surface of the rails of the unmanned overhead crane; The intrinsic feature map is obtained by performing convolution operation on the surface image of the unmanned crane rail; The intrinsic feature map is input into multiple parallel nonlinear transformation branches. In each nonlinear transformation branch, the intrinsic feature map is subjected to convolution and nonlinear activation to obtain a corresponding sub-feature map. The sub-feature maps output from multiple nonlinear transformation branches are concatenated to obtain a phantom feature map. The intrinsic feature map and the phantom feature map are concatenated by channels to obtain an initial feature map; The initial feature map is processed simultaneously in both the channel dimension and the spatial dimension to obtain the channel attention vector and the spatial attention map. The channel attention vector is added element-wise to the spatial attention map, and the addition result is activated to obtain a comprehensive weight map. The initial feature map is weighted element-wise using the comprehensive weight map to obtain the enhanced feature map; Candidate regions are generated based on the enhanced feature map, and the candidate regions are classified and regressed to output the rail surface defect detection results.

[0006] As a preferred technical solution, in each of the nonlinear transformation branches, the intrinsic feature map is subjected to convolution and nonlinear activation to obtain a corresponding sub-feature map, the expression of which is as follows:

[0007] in, Indicates the first The sub-feature map output by each nonlinear transformation branch This represents the intrinsic feature map. Indicates the first Nonlinear transformation function of each branch, For activation function, Indicates the first Convolutional kernels for each branch, This represents the convolution operation. Indicates the bias term. The value is an integer ranging from 1 to K, where K represents the total number of nonlinear transformation branches.

[0008] As a preferred technical solution, the step of concatenating the sub-feature maps output from multiple nonlinear transformation branches to obtain a phantom feature map includes: The sub-feature maps output from the K nonlinear transformation branches are concatenated along the channel dimension. A channel shuffling convolution operation is then performed on the concatenated feature map to obtain the phantom feature map, the expression of which is shown below:

[0009] in, This represents the phantom feature map. This indicates a channel shuffling convolution operation. This indicates a splicing operation.

[0010] As a preferred technical solution, the generation process of the phantom feature map satisfies the following relationship:

[0011] in, Corresponding to the phantom feature map , representing the output vector in the feature space; The intrinsic feature map representing the input; Indicates to The processing results of each parallel branch are concatenated and spliced ​​together. S It is a non-linear activation function. This represents the convolution kernel.

[0012] As a preferred technical solution, the initial feature map is weighted element-wise using the comprehensive weight map to obtain an enhanced feature map, the expression of which is as follows:

[0013] in, Represents the enhanced feature map, Represents the initial feature map. This indicates element-wise multiplication. Represents the overall weighting chart; The comprehensive weighting diagram The calculation formula is as follows:

[0014] in, This represents the Sigmoid activation function. This represents the channel attention vector. This represents the spatial attention map.

[0015] As a preferred technical solution, the step of processing the initial feature map in terms of channel dimensions to obtain a channel attention vector includes: The initial feature map is then subjected to global average pooling and input into a multilayer perceptron network. Dimensionality is reduced through a first fully connected layer and increased through a second fully connected layer, followed by batch normalization to obtain the channel attention vector, the expression of which is shown below:

[0016] in, This indicates a batch normalization operation. This indicates a global average pooling operation. and These represent the weight matrix and bias vector of the first fully connected layer, respectively. and These represent the weight matrix and bias vector of the second fully connected layer, respectively.

[0017] As a preferred technical solution, the step of generating candidate regions based on the enhanced feature map and classifying and regressing the candidate regions includes: The enhanced feature map is convolved and traversed using a preset region proposal network. Based on the preset defect-sensitive anchor boxes, the foreground confidence of the target and the preliminary regression offset of the bounding box are calculated. The non-maximum suppression algorithm is used to remove redundant boxes with an overlap of more than a threshold and to select suggested candidate boxes. Based on the spatial coordinate information of the proposed candidate boxes, feature alignment and indexing are performed on the enhanced feature map, and corresponding local enhanced feature blocks are extracted. The local enhanced feature blocks contain texture and edge information weighted by the attention mechanism. The local enhanced feature blocks are subjected to gridded max pooling using region of interest pooling operations, and the local enhanced feature blocks are uniformly mapped to defect feature vectors of fixed dimensions. The defect feature vector is input into a classification and regression network, and the specific category probability of the rail surface defect and the fine position correction parameters relative to the proposed candidate box are predicted through a fully connected layer.

[0018] As a preferred technical solution, after obtaining the fine position correction parameters of the suggested candidate boxes, the method further includes: The center coordinates and length and width dimensions of the suggested candidate box are offset and compensated using the fine position correction parameters, and the suggested candidate box is adjusted to a fine bounding box. The spatial coordinates of the refined bounding box are projected onto the comprehensive weight map, and local salient regions within the coverage area of ​​the refined bounding box are extracted. The average energy density of all pixels within the local saliency region is calculated as the visual attention factor; The maximum value among the specific category probabilities is extracted as the original classification confidence score. The original classification confidence score is then weighted and corrected using the visual attention factor to generate the final discrimination confidence score for rail surface defects.

[0019] In a second aspect, the present invention also proposes an electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to perform the operations performed by the unmanned overhead crane rail surface defect detection method as described in any of the embodiments of the first aspect.

[0020] Thirdly, the present invention also proposes a computer-readable storage medium storing a program that is executed by a processor as described in any of the embodiments of the first aspect of the unmanned overhead crane rail surface defect detection method.

[0021] The beneficial effects of the present invention include at least the following: This invention introduces multiple parallel nonlinear transformation branches into the intrinsic feature map. Within each branch, convolution operations and nonlinear activation processing are used to generate sub-feature maps, which are then concatenated to obtain a phantom feature map. This multi-path nonlinear feature construction method breaks the limitations of traditional linear feature generation. While significantly reducing the number of model parameters and computational complexity, it greatly enhances the richness and diversity of the feature space, thus fundamentally solving the contradiction of insufficient feature representation capability in lightweight models. Furthermore, this invention performs parallel processing of the channel and spatial dimensions, and performs element-wise addition and activation operations on the generated channel attention vector and spatial attention map to construct a comprehensive weight map that integrates both dimensions. This mechanism simultaneously strengthens the focusing ability on subtle defects such as cracks and wear from both the importance of feature channels and the sensitivity of spatial location, effectively suppressing interference from environmental noise such as uneven illumination and complex background textures. In summary, this invention can significantly improve the robustness of unmanned overhead crane rail surface defect detection in complex industrial environments, while simultaneously ensuring real-time detection and high accuracy. Attached Figure Description

[0022] Figure 1 This is a flowchart illustrating the method for detecting surface defects on the rails of an unmanned overhead crane provided in an embodiment of the present invention.

[0023] Figure 2 This is a schematic diagram of the multipath nonlinear phantom structure provided in an embodiment of the present invention.

[0024] Figure 3 In the figures, (a), (b), (c), (d), (e), and (f) are different rail surface defect detection results provided in the embodiments of the present invention.

[0025] Figure 4 This is a schematic diagram of the structure of the electronic device provided in an embodiment of the present invention. Detailed Implementation

[0026] The embodiments of the present invention will be described below with reference to the accompanying drawings and preferred technical solutions. Those skilled in the art can easily understand other advantages and effects of the present invention from the content disclosed in this specification. The present invention can also be implemented or applied through other different specific embodiments, and various details in this specification can also be modified or changed based on different viewpoints and applications without departing from the spirit of the present invention. It should be understood that the preferred technical solutions are only for illustrating the present invention and not for limiting the scope of protection of the present invention.

[0027] It should be noted that the illustrations provided in the following embodiments are only schematic representations of the basic concept of the present invention. Therefore, the drawings only show the components related to the present invention and are not drawn according to the actual number, shape and size of the components in the actual implementation. In the actual implementation, the form, quantity and proportion of each component can be arbitrarily changed, and the layout of the components may also be more complex.

[0028] In the following description, numerous details are explored to provide a more thorough explanation of embodiments of the invention. However, it will be apparent to those skilled in the art that embodiments of the invention may be practiced without these specific details. In other embodiments, well-known structures and devices are shown in block diagram form rather than in detail to avoid obscuring embodiments of the invention.

[0029] Example 1 This embodiment proposes a method for detecting surface defects in the rails of unmanned overhead cranes, such as... Figure 1 As shown, Figure 1 This is a flowchart illustrating a method for detecting surface defects in the rails of an unmanned overhead crane, as provided in this embodiment. The method includes the following steps: S1: Acquire an image of the surface of the unmanned overhead crane rails; S2: Perform convolution operation on the surface image of the unmanned crane rail to obtain the intrinsic feature map; S3: Input the intrinsic feature map into multiple parallel nonlinear transformation branches, perform convolution operation and nonlinear activation processing on the intrinsic feature map in each nonlinear transformation branch to obtain the corresponding sub-feature map, and concatenate the sub-feature maps output by multiple nonlinear transformation branches to obtain the phantom feature map; S4: Perform channel concatenation between the intrinsic feature map and the phantom feature map to obtain an initial feature map; S5: Simultaneously process the initial feature map in both channel dimension and spatial dimension to obtain channel attention vector and spatial attention map; S6: Add the channel attention vector to the spatial attention map element by element, and perform activation operation on the addition result to obtain the comprehensive weight map; S7: The initial feature map is weighted element-wise using the comprehensive weight map to obtain the enhanced feature map; S8: Generate candidate regions based on the enhanced feature map, classify and regress the candidate regions, and output the rail surface defect detection results.

[0030] Understandably, by introducing multiple parallel nonlinear transformation branches into the intrinsic feature map, and generating sub-feature maps in each branch using convolution operations and nonlinear activation processing, and then concatenating them to obtain a phantom feature map, this multi-path nonlinear feature construction method breaks the limitations of traditional linear feature generation. While significantly reducing the number of model parameters and computational complexity, it greatly enhances the richness and diversity of the feature space, thus fundamentally solving the contradiction of insufficient feature representation capability in lightweight models. Furthermore, this invention performs parallel processing of the channel and spatial dimensions, and performs element-wise addition and activation operations on the generated channel attention vector and spatial attention map to construct a comprehensive weight map that integrates both dimensions, weighting the features element-wise. This mechanism can simultaneously enhance the focusing ability on subtle defects such as cracks and wear from both the importance of the feature channel and the sensitivity of spatial location, effectively suppressing interference from environmental noise such as uneven illumination and complex background textures. In summary, this invention can significantly improve the robustness of unmanned overhead crane rail surface defect detection in complex industrial environments, while simultaneously ensuring real-time detection and high accuracy.

[0031] Example 2 This embodiment is an improvement on the unmanned crane rail surface defect detection method proposed in Embodiment 1.

[0032] It should be noted that, due to the insufficient feature representation capability caused by linear phantom generation in GhostNet, this embodiment proposes a multi-path, non-linear phantom structure to further improve the diversity and richness of model features without increasing computational cost as much as possible. In the standard Ghost module, the input feature map can be represented as:

[0033] The standard Ghost module will first use Convolution produces Individual eigenvalue maps:

[0034] in This represents the convolution operation. For each intrinsic feature map Generate through a simple, linear convolution. Phantom feature map:

[0035] in Indicates fixed The convolution, the final output of the module This process has obvious limitations: Essentially A linear combination of , whose characteristic space can be represented as:

[0036] in Represented as a tensor space, This represents convolution. The space effectively suppresses the diversity and richness of the model's feature representations.

[0037] Given the limitations of the standard Ghost module, this embodiment proposes a multi-path nonlinear phantom structure, such as... Figure 2 As shown, Figure 2 This is a schematic diagram of the multipath nonlinear phantom structure provided in an embodiment of the present invention. Its core is the transformation of a single linear function. Replace with a multi-branch and parallel nonlinear transformation function In this embodiment, the intrinsic feature map is subjected to convolution and nonlinear activation processing in each of the nonlinear transformation branches to obtain the corresponding sub-feature map, the expression of which is as follows:

[0038] in, Indicates the first The sub-feature map output by each nonlinear transformation branch This represents the intrinsic feature map. Indicates the first Nonlinear transformation function of each branch, For activation function, Indicates the first Convolutional kernels for each branch, This represents the convolution operation. Indicates the bias term. The value is an integer ranging from 1 to K, where K represents the total number of nonlinear transformation branches.

[0039] In this embodiment, the step of concatenating the sub-feature maps output from multiple nonlinear transformation branches to obtain a phantom feature map includes: The sub-feature maps output from the K nonlinear transformation branches are concatenated along the channel dimension. A channel shuffling convolution operation is then performed on the concatenated feature map to obtain the phantom feature map, the expression of which is shown below:

[0040]

[0041] in, This represents the phantom feature map. for s A phantom feature map, This indicates a channel shuffling convolution operation. This indicates a splicing operation.

[0042] This process can generate a more powerful and richer feature space:

[0043] in, Corresponding to the phantom feature map , representing the output vector in the feature space; The intrinsic feature map representing the input; Indicates to The processing results of each parallel branch are concatenated and spliced ​​together. S It is a non-linear activation function. This represents the convolution kernel.

[0044] It should be noted that traditional convolutional neural networks struggle to address the imbalance of importance across different channels and locations. To address this, a bottleneck attention module (BAM) is proposed. This module uses a dual-branch structure to generate corresponding attention maps in both the channel and spatial dimensions, which are then merged into a unified attention weight. This module effectively enhances the feature suppression of key information while suppressing irrelevant information, thus improving the model's focusing ability.

[0045] For a given feature map BAM infers an attention map .

[0046] In this embodiment, the initial feature map is weighted element-wise using the comprehensive weight map to obtain the enhanced feature map, the expression of which is as follows:

[0047] in, Represents the enhanced feature map, Represents the initial feature map. This indicates element-wise multiplication. Represents the overall weighting chart; The comprehensive weighting diagram The calculation formula is as follows:

[0048] in, This represents the Sigmoid activation function. This represents the channel attention vector. This represents the spatial attention map.

[0049] To aggregate the feature maps in each channel, Perform global average pooling to generate a vector channel. In order to obtain from vector channels To acquire channel attention, a multilayer perceptron (MLP) with hidden layers is set. To conserve parameter overhead, the hidden activation size is set to... ,in To reduce the scaling ratio, a batch normalization (BN) layer is added after the MLP to adjust the ratio with the spatial output branch. In this embodiment, the channel-dimensional processing of the initial feature map to obtain the channel attention vector includes: The initial feature map is then subjected to global average pooling and input into a multilayer perceptron network. Dimensionality is reduced through a first fully connected layer and increased through a second fully connected layer, followed by batch normalization to obtain the channel attention vector, the expression of which is shown below:

[0050] in, This indicates a batch normalization operation. This indicates a global average pooling operation. and These represent the weight matrix and bias vector of the first fully connected layer, respectively. and These represent the weight matrix and bias vector of the second fully connected layer, respectively. , , , .

[0051] In this embodiment, the step of generating candidate regions based on the enhanced feature map and performing classification and regression processing on the candidate regions includes: The enhanced feature map is convolved and traversed using a preset region proposal network. Based on the preset defect-sensitive anchor boxes, the foreground confidence of the target and the preliminary regression offset of the bounding box are calculated. The non-maximum suppression algorithm is used to remove redundant boxes with an overlap of more than a threshold and to select suggested candidate boxes. Based on the spatial coordinate information of the proposed candidate boxes, feature alignment and indexing are performed on the enhanced feature map, and corresponding local enhanced feature blocks are extracted. The local enhanced feature blocks contain texture and edge information weighted by the attention mechanism. The local enhanced feature blocks are subjected to gridded max pooling using region of interest pooling operations, and the local enhanced feature blocks are uniformly mapped to defect feature vectors of fixed dimensions. The defect feature vector is input into a classification and regression network, and the specific category probability of the rail surface defect and the fine position correction parameters relative to the proposed candidate box are predicted through a fully connected layer.

[0052] It should be noted that, in the specific implementation process, the detection head structure designed in this embodiment fully considers the special characteristics and diversity of rail surface defect morphology. In the region proposal stage, since rail cracks typically exhibit extremely long and thin longitudinal distributions, while spalling defects present irregular blocky features, this embodiment improves detection coverage by pre-setting defect-sensitive anchor frames. These anchor frame clusters do not use a universal square ratio, but are specifically configured with extreme aspect ratios based on statistical analysis of prior geometric knowledge of rail defects. This allows the proposed candidate boxes to more accurately encompass the long and thin longitudinally extending cracks, fundamentally solving the problem of missed detections caused by the low intersection-union ratio of conventional anchor frames. In the feature extraction and mapping stage, the extracted local enhanced feature blocks have undergone deep purification using the aforementioned attention mechanism, eliminating most of the texture noise caused by uneven lighting or oil stains, while retaining the edge information crucial for defect discrimination. By performing gridded max pooling processing, the system can eliminate the influence of input image scale variations and proposal box size differences, uniformly mapping all local features into fixed-dimensional feature vectors. This step ensures that the data input to the fully connected layer has a high degree of structural consistency, which enables the classification branch to more robustly predict the specific category of the defect, while the regression branch can output high-precision fine position correction parameters based on the boundary information implied in the feature vector, so that the final bounding box can fit the physical edge of the rail defect perfectly.

[0053] In this embodiment, after obtaining the fine position correction parameters of the suggested candidate boxes, the method further includes: The center coordinates and length and width dimensions of the suggested candidate box are offset and compensated using the fine position correction parameters, and the suggested candidate box is adjusted to a fine bounding box. The spatial coordinates of the refined bounding box are projected onto the comprehensive weight map, and local salient regions within the coverage area of ​​the refined bounding box are extracted. The average energy density of all pixels within the local saliency region is calculated as the visual attention factor; The maximum value among the specific category probabilities is extracted as the original classification confidence score. The original classification confidence score is then weighted and corrected using the visual attention factor to generate the final discrimination confidence score for rail surface defects.

[0054] It should be noted that, in the specific implementation process, during the bias compensation calculation, the originally coarse suggested candidate boxes are translated and scaled based on the residual values ​​of the regression predictions, thereby enabling the generated refined bounding boxes to more accurately define the edges of minute cracks on the rail surface. Subsequently, the geometric coordinates of these refined bounding boxes are back-projected back onto the comprehensive weight map generated by the attention mechanism. The core logic lies in using the saliency information of the feature layer to assist the decision layer in its judgment. The calculated mean energy density is the visual attention factor, representing the original degree of recognition of the importance of this region by the deep network during the feature extraction stage. By multiplying the original classification confidence score output by the classification network with the visual attention factor in a weighted manner, a secondary filtering of the detection results can be achieved: if a candidate region has a high classification score but a weak response in the attention map, it is likely to be noise in the background, and its discrimination confidence will be significantly reduced; conversely, for regions with strong attention responses, their scores will be maintained or enhanced. Combined with a soft nonmaximum suppression algorithm that can preserve highly overlapping continuous defects, the robustness of the unmanned overhead crane in defect identification under high-speed movement and drastic lighting conditions is significantly improved, ensuring the accuracy and reliability of the final output results.

[0055] Example 3 This embodiment, based on the unmanned overhead crane rail surface defect detection method proposed in the previous embodiment, systematically deploys and implements it in an industrial environment. This embodiment adopts a layered architecture design, constructing a complete system composed of four core modules: data acquisition, edge computing, data transmission, and monitoring and management. Specifically, the data acquisition module uses industrial cameras and lighting equipment installed in front of or to the side of the crane's traveling mechanism for real-time image capture; the edge computing module carries the aforementioned lightweight Faster R-CNN model, responsible for real-time image processing and preliminary defect analysis; the data transmission module ensures efficient flow of detection data between control nodes; and the monitoring and management module provides maintenance personnel with defect visualization, alarm management, and historical data query services through a human-machine interface.

[0056] In the hardware deployment scheme, the image acquisition system uses a global shutter industrial camera with a resolution of 20 megapixels or higher. This camera features HDR functionality and can effectively cope with complex industrial lighting environments. The camera has a protection rating of at least IP67, possessing excellent dustproof, waterproof, and shockproof characteristics, ensuring long-term stability under harsh working conditions. The edge computing platform uses the NVIDIA Jetson AGX Orin platform, which features 64GB of memory and 2048 CUDA cores, and its operating temperature range covers -25℃ to 80℃, perfectly adapting to the drastic temperature changes in the unmanned overhead crane operating environment.

[0057] At the software system implementation level, unified management of multiple camera devices is achieved through camera driver middleware, enabling adaptive adjustment of acquisition frequency and automatic detection of abnormal states such as defocusing. The defect detection service is built on a microservice architecture, with each detection service running independently and supporting multi-computing node collaboration, achieving effective load balancing. Furthermore, the human-computer interaction interface developed based on web technology supports cross-platform access, can mark the crane position in real time on a 3D dock or port map, and generate a track health heatmap, visually displaying the defect density of each track segment with varying color intensity. Through real-time operational indicator dashboards, managers can monitor today's inspection mileage, defect discovery, and repair progress at any time. Combined with the equipment status panel, this enables real-time control of the entire system's hardware operating status.

[0058] In practice, when the unmanned overhead crane triggers a data acquisition command, the industrial camera immediately acquires high-definition images or videos of the track and transmits them to an edge computing platform for real-time defect detection. The detection results are fed back to the monitoring center via the communication network. The system automatically takes response measures based on the defect level and prompts maintenance personnel to conduct targeted checks and repairs. This integrated deployment scheme greatly improves the automation level of rail inspection, ensures the consistency of detection accuracy and real-time performance, and provides solid data support for the safe operation of the unmanned overhead crane.

[0059] Example 4 This embodiment further conducts experiments and analyses.

[0060] The rail surface defect dataset in this embodiment is RSDDs, which has only 195 samples. Through data augmentation, the sample size was expanded to 800 samples. After screening and processing, the rail surface defects were divided into three categories: cracks, wear, and peeling. The specific location of the defect was marked for each category. The marked information was encapsulated in VOC format to form the dataset of this embodiment.

[0061] The experimental hardware configuration is as follows: CPU: Xeon(R) Platinum 8362 @ 2.8GHz, 64GB RAM, GPU: NVIDIA GeForce RTX 3090, operating system: Windows 10, using the PyTorch deep learning framework, CUDA version 12.4, and training epochs set to 300. The specific experimental environment is shown in Table 1. Table 1 Experimental Environment Configuration

[0062] To evaluate the detection performance of the improved Faster R-CNN model, this embodiment uses the following metrics: precision (P), recall (R), and mean precision (mAP), where mAP is composed of mAP50 and mAP50-95, and the relevant calculation formulas are as follows:

[0063]

[0064]

[0065]

[0066] Table 2. Results of backbone network ablation experiments

[0067] As shown in Table 2, experiments were conducted on four commonly used backbone networks while keeping other variables constant. The experimental results show that GhostNet performs best in terms of both parameter count (23.6M) and computational complexity (28.5GFLOPs). Compared to the traditional RestNet-50, it reduces the parameter count by 17.9M and the computational complexity by 16.7G. This indicates that GhostNet can significantly reduce the number of model parameters, making it more suitable for practical industrial deployment scenarios.

[0068] To improve the model's focusing ability, experiments were conducted on five mainstream attention mechanisms within the Faster R-CNN model framework.

[0069] Table 3 Results of the Attention Ablation Experiment

[0070] The results are shown in Table 3. The experiments used Faster R-CNN as the baseline model, and introduced attention modules from SENet, ECA-Net, SAM, CBAM, and BAM respectively. The results show that BAM attention achieved the best performance across multiple metrics: although it slightly increased the number of parameters and computational complexity, mAP@50-95 and mAP@50 reached 56.1% and 80.7% respectively, representing improvements of 1.9% and 2.2% compared to the baseline, demonstrating its efficient feature enhancement capabilities.

[0071] To evaluate the combined effect of GhostNet and the BAM mechanism, an ablation experiment was designed.

[0072] Table 4 Results of the overall ablation experiment

[0073] The results are shown in Table 4. Replacing only the backbone network with GhostNet reduced the number of parameters by 17.93M and the computational complexity by 80.9G compared to the baseline model. However, mAP@50-95 and other metrics declined, indicating that simply reducing the backbone network may compromise the model's feature extraction capabilities. While adding the BAM mechanism slightly improved computational complexity and the number of parameters, it enhanced the model's feature extraction ability. The model with GhostNet and the BAM mechanism achieved an mAP@50 of 78.8% with only 22.93M parameters, striking the best balance between accuracy and efficiency. Compared to the original Faster R-CNN, the number of parameters decreased by 44.8%, and the accuracy improved by 0.3 percentage points, demonstrating the effectiveness of the lightweight improvement strategy proposed in this embodiment.

[0074] To demonstrate the superiority of the proposed method in this embodiment, comparative experiments were conducted using mainstream object detection algorithms such as R-CNN, FastR-CNN, FasterR-CNN (baseline), YOLOv8n, and SSD. All comparative experiments were performed under the same experimental environment.

[0075] Table 5 Comparison of experimental results

[0076] The results are shown in Table 5. The method in this embodiment achieves excellent results on multiple metrics: mAP@50-95 reaches 54.0%, slightly lower than the original Faster R-CNN; furthermore, not only does it achieve mAP@50 of 78.8%, outperforming other models, but its parameter count is only 22.93M, less than 55% of the original Faster R-CNN, significantly better than other two-stage algorithms. YOLOv8n, as a single-stage algorithm, achieves the lowest parameter count, but its single-detection accuracy is significantly lower than the method in this embodiment. In summary, the method in this embodiment maintains high detection accuracy while achieving lightweight model design, making it more suitable for embedded deployment scenarios in unmanned overhead cranes.

[0077] like Figure 3 As shown, Figure 3 Figures (a), (b), (c), (d), (e), and (f) show the different rail surface defect detection results provided in the embodiments of the present invention. Figure 3 In (a), (b), (c), (d), (e), and (f), the red prediction boxes clearly indicate the location, category, and confidence score of the defects. The experimental results show that the method in this embodiment is more stable in detecting small defects such as cracks, wear, and peeling, with both low false negative and false positive rates.

[0078] This is mainly due to the multi-path nonlinear aggregation model constructed in the backbone network of this invention. Through nonlinear manifold mapping guided by the SiLU activation function, the ability to extract high-frequency edge features of slender cracks is enhanced. Simultaneously, by combining the BAM bottleneck attention mechanism, environmental noise such as ballast interference and metal reflection is effectively suppressed through parallel refinement of the channel and spatial dimensions. Especially in industrial scenarios with uneven lighting and complex backgrounds for unmanned overhead cranes, the system can still ensure the high reliability of the final judgment result and maintain excellent detection robustness through the confidence re-evaluation mechanism of visual attention factors.

[0079] It should be noted that while significantly reducing the number of model parameters (from 41.53M to 22.93M), the method in this embodiment achieves real-time monitoring on edge devices, ensuring that high-precision defect capture can still be maintained in high-speed operating environments.

[0080] Example 5 Figure 4 This is a schematic diagram of the structure of the electronic device 100 provided in this embodiment. The electronic device 100 includes: a memory 101, a processor 102, and a computer program stored in the memory 101 and executable on the processor 102.

[0081] When the processor 102 executes the program, it implements the unmanned crane rail surface defect detection method provided in the above embodiments.

[0082] Furthermore, the electronic device 100 also includes a communication interface 103 for communication between the memory 101 and the processor 102.

[0083] The memory 101 may include high-speed RAM (Random Access Memory) memory, and may also include non-volatile memory, such as at least one disk storage.

[0084] If the memory 101, processor 102, and communication interface 103 are implemented independently, then the communication interface 103, memory 101, and processor 102 can be interconnected via a bus to complete communication between them. The bus can be an ISA (Industry Standard Architecture) bus, a PCI (Peripheral Component Interconnect) bus, or an EISA (Extended Industry Standard Architecture) bus, etc. The bus can be divided into address bus, data bus, control bus, etc. For ease of representation, Figure 4 The bus is represented by a single thick line, but this does not mean that there is only one bus or one type of bus.

[0085] Optionally, in a specific implementation, if the memory 101, processor 102, and communication interface 103 are integrated on a single chip, then the memory 101, processor 102, and communication interface 103 can communicate with each other through an internal interface.

[0086] Processor 102 may be a CPU (Central Processing Unit), an ASIC (Application Specific Integrated Circuit), or one or more integrated circuits configured to implement embodiments of the present invention.

[0087] This invention also provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the above-described method for detecting surface defects on the rails of an unmanned overhead crane.

[0088] In the description of this specification, the references to terms such as "one embodiment," "some embodiments," "example," "specific example," or "some examples," etc., indicate that a specific feature, structure, material, or characteristic described in connection with that embodiment or example is included in at least one embodiment or example of the present invention. In this specification, the illustrative expressions of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials, or characteristics described may be combined in any suitable manner in one or more embodiments or examples. Moreover, without contradiction, those skilled in the art can combine and integrate the different embodiments or examples described in this specification, as well as the features of different embodiments or examples.

[0089] Furthermore, the terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of indicated technical features. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one of that feature. In the description of this invention, "N" means at least two, such as two, three, etc., unless otherwise explicitly specified.

[0090] Any process or method description in the flowchart or otherwise herein can be understood as representing a module, segment, or portion of code comprising one or more N executable instructions for implementing custom logic functions or processes, and the scope of preferred embodiments of the invention includes additional implementations in which functions may be performed not in the order shown or discussed, including substantially simultaneously or in reverse order depending on the functions involved, as should be understood by those skilled in the art to which embodiments of the invention pertain.

[0091] It should be understood that various parts of the present invention can be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the N steps or methods can be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware as in another embodiment, it can be implemented using any of the following techniques known in the art, or a combination thereof: discrete logic circuits having logic gates for implementing logical functions on data signals, application-specific integrated circuits (ASICs) having suitable combinational logic gates, programmable gate arrays (FPGAs), field-programmable gate arrays (FPGAs), etc.

[0092] Those skilled in the art will understand that all or part of the steps of the methods described in the above embodiments can be implemented by a program instructing related hardware, and the program can be stored in a computer-readable storage medium. When executed, the program includes one or a combination of the steps of the method embodiments.

[0093] Obviously, the above embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the implementation of the present invention. Those skilled in the art can make other variations or modifications based on the above description. It is neither necessary nor possible to exhaustively describe all embodiments here. Any modifications, equivalent substitutions, and improvements made within the spirit and principles of the present invention should be included within the scope of protection of the claims of the present invention.

Claims

1. A method for detecting surface defects in the rails of an unmanned overhead crane, characterized in that, include: Acquire images of the surface of the rails of the unmanned overhead crane; The image of the unmanned overhead crane rail surface is convolved to obtain an intrinsic feature map; The intrinsic feature map is input into multiple parallel nonlinear transformation branches. In each nonlinear transformation branch, the intrinsic feature map is subjected to convolution and nonlinear activation to obtain a corresponding sub-feature map. The sub-feature maps output from multiple nonlinear transformation branches are concatenated to obtain a phantom feature map. The intrinsic feature map and the phantom feature map are concatenated by channels to obtain an initial feature map; The initial feature map is processed simultaneously in both the channel dimension and the spatial dimension to obtain the channel attention vector and the spatial attention map. The channel attention vector is added element-wise to the spatial attention map, and the addition result is activated to obtain a comprehensive weight map. The initial feature map is weighted element-wise using the comprehensive weight map to obtain the enhanced feature map; Candidate regions are generated based on the enhanced feature map, and the candidate regions are classified and regressed to output the rail surface defect detection results.

2. The method for detecting surface defects in the rails of an unmanned overhead crane according to claim 1, characterized in that, In each of the nonlinear transformation branches, convolution and nonlinear activation operations are performed on the intrinsic feature map to obtain the corresponding sub-feature map, the expression of which is as follows: in, Indicates the first The sub-feature map output by each nonlinear transformation branch This represents the intrinsic feature map. Indicates the first Nonlinear transformation function of each branch, For activation function, Indicates the first Convolutional kernels for each branch, This represents the convolution operation. Indicates the bias term. The value is an integer ranging from 1 to K, where K represents the total number of nonlinear transformation branches.

3. The method for detecting surface defects in the rails of an unmanned overhead crane according to claim 2, characterized in that, The step of concatenating the sub-feature maps output from multiple nonlinear transformation branches to obtain a phantom feature map includes: The sub-feature maps output from the K nonlinear transformation branches are concatenated along the channel dimension. A channel shuffling convolution operation is then performed on the concatenated feature map to obtain the phantom feature map, the expression of which is shown below: in, This represents the phantom feature map. This indicates a channel shuffling convolution operation. This indicates a splicing operation.

4. The method for detecting surface defects in the rails of an unmanned overhead crane according to claim 3, characterized in that, The generation process of the phantom feature map satisfies the following relationship: in, Corresponding to the phantom feature map , representing the output vector in the feature space; The intrinsic feature map representing the input; Indicates to The processing results of each parallel branch are concatenated and spliced ​​together. S It is a non-linear activation function. This represents the convolution kernel.

5. The method for detecting surface defects in the rails of an unmanned overhead crane according to claim 1, characterized in that, The initial feature map is weighted element-wise using the comprehensive weight map to obtain the enhanced feature map, the expression of which is as follows: in, Represents the enhanced feature map, Represents the initial feature map. This indicates element-wise multiplication. Represents the overall weighting chart; The comprehensive weighting diagram The calculation formula is as follows: in, This represents the Sigmoid activation function. This represents the channel attention vector. This represents the spatial attention map.

6. The method for detecting surface defects in the rails of an unmanned overhead crane according to claim 5, characterized in that, The process of processing the initial feature map along the channel dimension to obtain the channel attention vector includes: The initial feature map is then subjected to global average pooling and input into a multilayer perceptron network. Dimensionality is reduced through a first fully connected layer and increased through a second fully connected layer, followed by batch normalization to obtain the channel attention vector, the expression of which is shown below: in, This indicates a batch normalization operation. This indicates a global average pooling operation. and These represent the weight matrix and bias vector of the first fully connected layer, respectively. and These represent the weight matrix and bias vector of the second fully connected layer, respectively.

7. The method for detecting surface defects in the rails of an unmanned overhead crane according to claim 1, characterized in that, The step of generating candidate regions based on the enhanced feature map and performing classification and regression processing on the candidate regions includes: The enhanced feature map is convolved and traversed using a preset region proposal network. Based on the preset defect-sensitive anchor boxes, the foreground confidence of the target and the preliminary regression offset of the bounding box are calculated. The non-maximum suppression algorithm is used to remove redundant boxes with an overlap of more than a threshold and to select suggested candidate boxes. Based on the spatial coordinate information of the proposed candidate boxes, feature alignment and indexing are performed on the enhanced feature map, and corresponding local enhanced feature blocks are extracted. The local enhanced feature blocks contain texture and edge information weighted by the attention mechanism. The local enhanced feature blocks are subjected to gridded max pooling using region of interest pooling operations, and the local enhanced feature blocks are uniformly mapped to defect feature vectors of fixed dimensions. The defect feature vector is input into a classification and regression network, and the specific category probability of the rail surface defect and the fine position correction parameters relative to the proposed candidate box are predicted through a fully connected layer.

8. The method for detecting surface defects in the rails of an unmanned overhead crane according to claim 7, characterized in that, After obtaining the fine-grained position correction parameters of the suggested candidate boxes, the method further includes: The center coordinates and length and width dimensions of the suggested candidate box are offset and compensated using the fine position correction parameters, and the suggested candidate box is adjusted to a fine bounding box. The spatial coordinates of the refined bounding box are projected onto the comprehensive weight map, and local salient regions within the coverage area of ​​the refined bounding box are extracted. The average energy density of all pixels within the local saliency region is calculated as the visual attention factor; The maximum value among the specific category probabilities is extracted as the original classification confidence score. The original classification confidence score is then weighted and corrected using the visual attention factor to generate the final discrimination confidence score for rail surface defects.

9. An electronic device, characterized in that, The electronic device includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to perform the operations performed by the unmanned overhead crane rail surface defect detection method as described in any one of claims 1 to 8.

10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a program that is executed by a processor as described in any one of claims 1 to 8, the method for detecting surface defects on the rails of an unmanned overhead crane.