A steel rail damage B-fluorographic image fine-grained identification method based on a convolution network
By using a rail damage identification method based on convolutional neural networks, fine-grained features of rail damage are automatically extracted, solving the problem that the identification effect in existing technologies depends on manual features, and achieving efficient and accurate rail damage detection.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- CHINA RAILWAY RAILWAY TECH SERVICE GRP CO LTD
- Filing Date
- 2026-03-25
- Publication Date
- 2026-06-19
AI Technical Summary
Existing technologies cannot automatically and accurately extract fine-grained features of rail damage. The identification effect depends on the effectiveness of manually selected features and is prone to omissions and misidentifications. The level of intelligence in data processing is low.
A rail damage identification model is constructed using a method based on convolutional neural networks and convolutional attention mechanisms. Through a feature extraction network, an attention enhancement module, and a classification and recognition network, fine-grained features of rail damage are automatically extracted, and the model is optimized using five-fold cross-validation and cross-entropy loss.
It enables automatic and accurate identification of rail damage, improves identification efficiency and accuracy, reduces the rate of missed and false detections, supports rapid and efficient intelligent detection, and reduces reliance on professional personnel.
Smart Images

Figure CN122244531A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of ultrasonic damage detection technology, and particularly relates to a fine-grained recognition method for rail damage B-mode images based on convolutional networks. Background Technology
[0002] Rail damage has a significant impact on the operational efficiency and safety of railway maintenance. Because damage can lead to changes in rail strength and structure, potentially causing serious track accidents, effective detection and monitoring are crucial. Rail flaw detection, as a key inspection technology, can identify potential rail defects at an early stage and provide a scientific basis for preventative maintenance.
[0003] Currently, rail flaw detection equipment based on ultrasonic technology is widely used in China's railways due to its strong environmental adaptability and multi-dimensional detection capabilities. By analyzing the ultrasonic B-mode characteristic maps of rails, the extent of rail damage can be determined. However, due to factors such as the limited methods for analyzing damage data and the low level of intelligent data processing, rail damage often relies on professionals to detect damage by replaying data. This results in a high degree of subjectivity in damage identification, low efficiency, and numerous limitations.
[0004] In recent years, with the development of artificial intelligence technology, some studies have begun to utilize machine learning models to identify rail damage data. This involves obtaining features from the ultrasonic B-mode characteristic maps of rails using statistical or signal processing methods, such as wavelet transform and principal component analysis; then employing machine learning methods for rail damage identification, such as support vector machines and Bayesian models. However, the effectiveness of these methods in identifying rail damage depends on the effectiveness of manually selected features. Furthermore, some damage is small in size, resulting in fewer extractable features, which can also lead to misidentification and missed identification of rail damage.
[0005] In summary, existing technologies cannot automatically and accurately extract fine-grained features of rail damage. The identification effectiveness relies on the effectiveness of manually selected features, and some small-sized damages are easily missed or misidentified. Furthermore, data processing is not intelligent and carries the risk of missed detections. In other words, there is a lack of a rail damage identification method that can automatically and accurately extract fine-grained features of rail damage. Therefore, developing a fine-grained rail damage B-image identification method based on convolutional networks has significant practical significance and application value. Summary of the Invention
[0006] The purpose of this invention is to provide a fine-grained method for identifying rail damage in B-mode images based on convolutional networks, aiming to solve the aforementioned technical problems.
[0007] This invention is implemented as follows: a fine-grained method for identifying rail damage in B-mode images based on convolutional networks, comprising the following steps:
[0008] Dataset establishment: Collect B-images of rails based on ultrasonic flaw detection to construct a dataset of rail damage samples;
[0009] Network Model Construction: A rail damage identification model was constructed based on convolutional neural networks combined with convolutional attention mechanisms;
[0010] Network training and optimization: Based on the dataset of rail damage samples, the rail damage identification model is trained and optimized to obtain a trained rail damage identification model;
[0011] Model evaluation: Fine-grained identification of rail damage in B-images is performed based on the trained rail damage identification model, and the identification results are evaluated.
[0012] Furthermore, the B-display image includes an abnormal reflection wave image and a fixed reflection wave image.
[0013] Furthermore, the rail damage identification model includes:
[0014] Feature extraction network: It consists of multiple convolutional modules, each of which contains convolutional layers, activation layers and pooling layers, used to extract multi-scale features from the input B-format image and output a feature map;
[0015] Attention Enhancement Module: Corresponding to the convolution module, it is used to enhance the feature maps output by the corresponding convolution module with fine-grained features in both channels and space, and output an enhanced feature map;
[0016] Classification and recognition network: used to predict the type of rail damage based on the enhanced feature map output by the attention enhancement module.
[0017] Furthermore, the attention enhancement module includes a channel attention module and a spatial attention module; for a given input feature map, the attention enhancement module infers the attention map sequentially along the two separate dimensions of channel and space, and then multiplies the attention map with the input feature map to perform adaptive feature refinement.
[0018] Furthermore, the implementation method of the channel attention module includes the following steps:
[0019] For the input C-dimensional feature map F, max pooling and average pooling are performed to aggregate spatial information, resulting in two C-dimensional pooled features F. max and F avg Where C represents the number of channels in the feature map after convolution;
[0020] F max and F avg The data is fed into a multilayer perceptron containing one hidden layer, resulting in two C×1×1 dimensional channel attention maps; the number of neurons in the hidden layer is... This refers to the compression ratio;
[0021] The corresponding elements of the two channel attention maps obtained from the multilayer perceptron are added together, and then activated by the sigmoid function to obtain the final channel attention map M. c .
[0022] Furthermore, the calculation process of the channel attention module is as follows:
[0023] ;
[0024] ;
[0025] Where σ represents the Sigmoid function, which normalizes the values of the channel attention map to the range [0, 1]; MLP represents a multilayer perceptron, consisting of two fully connected layers, forming the hidden layer and the output layer respectively, activated by the ReLU function in between; F is the input feature map; F avg and F max These represent the pooling features output after average pooling (AvgPool) and max pooling (MaxPool), respectively. and These are the parameters that the module needs to learn, namely the weights of the hidden and output layers of the multilayer perceptron. The output features are the result of integrating the input feature maps with the channel attention module. It represents the Hadamardi (or Hadama) stack.
[0026] Furthermore, the implementation method of the spatial attention module includes the following steps:
[0027] For the output features of the channel attention module First, max pooling (MaxPool) and average pooling (AvgPool) are performed along the channel direction to obtain two two-dimensional pooling features. and ;
[0028] Two pooling features and The features are stitched together along the channel dimension to obtain the stitched features;
[0029] The concatenated features are convolved using a 7×7 kernel and then activated with a sigmoid function to generate a spatial attention map M. s .
[0030] Furthermore, the calculation process of the spatial attention module is as follows:
[0031] ;
[0032] ;
[0033] in, This is a convolution with a kernel size of 7×7; It is the output feature after integrating the output features of the spatial attention module and the channel attention module.
[0034] Furthermore, the network training and optimization method specifically includes: using a five-fold cross-validation method, dividing the dataset into five equal parts, selecting four parts in sequence as the training set, and using the remaining part as the test set, taking the average accuracy of the five tests as the final result, and using it as the model performance index; the cross-entropy loss used during training is the cross-entropy loss, and the optimizer uses adaptive moment estimation.
[0035] Another objective of this invention is to provide a fine-grained recognition system for rail damage B-mode images based on convolutional networks, used to implement the aforementioned fine-grained recognition method for rail damage B-mode images based on convolutional networks, specifically including:
[0036] The dataset building unit is used to collect B-images of rails based on ultrasonic flaw detection and to build a dataset of rail damage samples.
[0037] The network model building unit is used to build a rail damage identification model based on convolutional neural networks combined with convolutional attention mechanisms.
[0038] The network training and optimization unit is used to train and optimize the rail damage identification model based on the dataset of rail damage samples to obtain the trained rail damage identification model.
[0039] The model evaluation unit is used to perform fine-grained identification of rail damage B-images based on the trained rail damage identification model, and to evaluate the identification results.
[0040] The present invention provides a fine-grained recognition method for rail damage B-mode images based on convolutional networks. For B-mode images detected by ultrasonic flaw detection, a convolutional neural network based on the convolutional attention mechanism is designed. It can automatically extract rail damage features while effectively focusing on small-sized damage, providing rail ultrasonic flaw detection equipment with fast and efficient intelligent damage detection function. Attached Figure Description
[0041] Figure 1 This is a flowchart illustrating the fine-grained recognition method for rail damage B-images based on convolutional networks provided in an embodiment of the present invention.
[0042] Figure 2 This is a schematic diagram of the structure of the rail damage identification model provided in an embodiment of the present invention.
[0043] Figure 3 This is a schematic diagram of the attention enhancement module provided in an embodiment of the present invention. Detailed Implementation
[0044] To make the objectives, technical solutions, and advantages of this invention clearer, the invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the invention.
[0045] like Figure 1 As shown, in one embodiment of the present invention, a fine-grained method for identifying rail damage in B-mode images based on convolutional networks is provided, comprising the following steps:
[0046] S1. Dataset Establishment: Collect B-images of rails based on ultrasonic flaw detection and construct a dataset of rail damage samples.
[0047] S2. Network Model Construction: Based on Convolutional Neural Network (CNN) combined with Convolutional Attention Mechanism (CBAM), a rail damage identification model is constructed.
[0048] S3. Network Training and Optimization: Based on the dataset of rail damage samples, the rail damage identification model is trained and optimized to obtain a trained rail damage identification model.
[0049] S4. Model Evaluation: Based on the trained rail damage recognition model, perform fine-grained recognition of rail damage B-images and evaluate the recognition results.
[0050] In this embodiment of the invention, a rail damage database is established, and based on the B-image of the ultrasonic rail flaw detector, a convolutional neural network is used to automatically extract image features. A convolutional attention mechanism is designed to extract fine-grained features of the damage, thereby avoiding the omission or misjudgment of minor damage.
[0051] In practical applications, the Flaw Detector Smart Terminal can be used to upload the detection data from the flaw detection instrument to the flaw detection management system. The Flaw Detector Smart Terminal is the mobile terminal of the flaw detection management system, enabling on-site work for the flaw detection team and real-time uploading of damage data. The flaw detection management system can store, query, and analyze the collected data, manage the playback of raw data files from rail flaw detection operations, and view the B-mode images from ultrasonic flaw detection. B-mode images can be divided into abnormal reflection wave images (rail damage) and fixed reflection wave images (equipment such as turnouts and welds).
[0052] In a preferred embodiment of the present invention, a rail damage recognition model is constructed based on the acquired rail B-image using a convolutional neural network. By training the model, deeper and more discriminative rail damage features are extracted, and combined with the convolutional attention mechanism CBAM, the model focuses on image channels and local space, improving the feature extraction capability for small-sized damages and avoiding missed or false detections of damage.
[0053] Specifically, such as Figure 2 As shown, the rail damage identification model includes:
[0054] Feature extraction network: It consists of 4 convolutional modules, each of which contains a convolutional layer, an activation layer and a pooling layer, used to extract multi-scale features from the input B-format image and output a feature map;
[0055] Attention Enhancement Module (CBAM): Corresponding to the convolutional module, it is used to enhance the feature maps output by the corresponding convolutional modules with fine-grained features in both channels and space, and output an enhanced feature map; an attention enhancement module is added to each convolutional module to achieve effective attention to feature information at different feature scales;
[0056] Classification and recognition network: used to predict rail damage categories based on the enhanced feature maps output by the attention enhancement module; the classification and recognition network includes a Flatten layer, a fully connected layer, and a Softmax activation layer.
[0057] In a preferred embodiment of the present invention, such as Figure 3 As shown, the attention enhancement module can include a channel attention module and a spatial attention module to focus on more effective fine-grained features of rail damage during the feature extraction stage. For a given input feature map, the attention enhancement module infers the attention map sequentially along the two separate dimensions of channel and space, and then multiplies the attention map with the input feature map to perform adaptive feature refinement.
[0058] Specifically, the feature maps output by the convolution module As input, the attention enhancement module can obtain a one-dimensional channel attention map. A two-dimensional spatial attention graph The attention processing procedure is illustrated by the following formula:
[0059] ;
[0060] ;
[0061] in, The output features are the result of integrating the input feature maps with the channel attention module. It is the output feature after integrating the output features of the spatial attention module and the channel attention module; The Hadamard product is a matrix that produces another matrix of the same dimension from two matrices of the same dimension, where each element... The elements of the original two matrices The product of.
[0062] In a preferred embodiment of the present invention, the input to the channel attention module is a C-dimensional feature map F, where C represents the number of channel dimensions of the feature map after convolution, and the output is a C×1×1 dimensional channel attention map. The specific implementation method includes the following steps:
[0063] For the input C-dimensional feature map F, max pooling and average pooling are performed to aggregate spatial information, resulting in two C-dimensional pooled features F. max and F avg ;
[0064] F max and F avg The data is fed into a multilayer perceptron containing one hidden layer, resulting in two C×1×1 dimension channel attention maps; to reduce the number of parameters, the number of neurons in the hidden layer is [value missing]. This refers to the compression ratio;
[0065] The corresponding elements of the two channel attention maps obtained from the multilayer perceptron are added together, and then activated by the sigmoid function to obtain the final channel attention map M. c .
[0066] Specifically, the calculation process of the channel attention module is as follows:
[0067] ;
[0068] Where σ represents the Sigmoid function, which normalizes the values of the channel attention map to the range [0, 1]; MLP represents a multilayer perceptron, consisting of two fully connected layers, forming the hidden layer and the output layer respectively, activated by the ReLU function in between; F is the input feature map; F avg and F max These represent the pooling features output after average pooling (AvgPool) and max pooling (MaxPool), respectively. and These are the parameters that the module needs to learn, namely the weights of the hidden and output layers of the multilayer perceptron.
[0069] In a preferred embodiment of the present invention, the input to the spatial attention module is the output feature refined by the channel attention map. The output is a two-dimensional spatial attention map of size H×W. The specific implementation method includes the following steps:
[0070] For the output features of the channel attention module First, max pooling (MaxPool) and average pooling (AvgPool) are performed along the channel direction, with both size and property 1×H×W, resulting in two two-dimensional pooling features. and ;
[0071] Two pooling features and The features are stitched together along the channel dimension to obtain the stitched features;
[0072] The concatenated features are convolved using a 7×7 kernel and then activated with a sigmoid function to generate a spatial attention map M. s .
[0073] Specifically, the calculation process of the spatial attention module is as follows:
[0074] ;
[0075] in, This is a convolution with a kernel size of 7×7.
[0076] In a preferred embodiment of the present invention, the network training and optimization method specifically includes: employing a five-fold cross-validation method, dividing the dataset into five equal parts, selecting four parts sequentially as the training set, and using the remaining part as the test set, taking the average accuracy of the five tests as the final result to ensure the result is stable and reliable, and using it as the model performance indicator; the cross-entropy loss used during training is the cross-entropy function, and the optimizer uses Adaptive Moment Estimation (Adam). In practical applications, the hyperparameters for training are shown in Table 1 below:
[0077] Table 1
[0078] parameter Setting value Initial learning rate 0.0001 Learning rate decay 0.8 / three iterations Early stop mechanism The process stops if the loss value does not decrease after 5 iterations. loss function Cross-entropy loss Optimizer Adaptive moment estimation Batch size 32
[0079] Based on the trained rail damage identification model, the system automatically identifies damage by intelligently replaying the original data files of rail flaw detection operations and comparing them with system data. This allows for verification of the accuracy of manual damage assessment and improves the accuracy of damage screening to over 98%.
[0080] At the technical level, the method provided in this invention employs a convolutional neural network (CNN) to automatically identify damage in B-mode images generated by ultrasonic flaw detection of rails. Based on an improved attention mechanism, the CNN can quickly extract deep features from the image and accurately classify minute damages, significantly improving detection accuracy and efficiency, and overcoming the shortcomings of traditional manual interpretation which is easily affected by subjective factors. Furthermore, this technology can be integrated with existing flaw detection systems, and algorithm optimization can further reduce false alarm and false negative rates, promoting the development of rail flaw detection towards intelligence and standardization.
[0081] In terms of economic benefits, the method provided by this invention can reduce reliance on professional inspection personnel, saving labor costs; by shortening the inspection cycle through automated analysis, it can improve the efficiency of rail inspection and indirectly reduce line downtime losses; early and accurate identification of damage can prevent defect expansion, extend the service life of rails, and reduce replacement and maintenance costs. In the long term, this method has the potential for large-scale promotion, creating new profit growth points for the railway industry.
[0082] In terms of social benefits, the method provided in this invention safeguards railway transportation safety through intelligent means, reduces the risk of accidents caused by rail damage, and improves public travel safety. Furthermore, this technology aligns with the trend of industrial digital transformation, providing a model case for the intelligent upgrading of the rail transit sector, and possesses broad social value and industry influence.
[0083] In summary, the method proposed in this embodiment of the invention can effectively improve the scientificity and practicality of rail damage identification, bringing technological progress, economic savings, and good social benefits to rail maintenance.
[0084] In another embodiment of the present invention, a fine-grained recognition system for rail damage B-mode images based on convolutional networks is also provided to implement the above method, specifically including:
[0085] The dataset building unit is used to collect B-images of rails based on ultrasonic flaw detection and to build a dataset of rail damage samples.
[0086] The network model building unit is used to build a rail damage identification model based on convolutional neural networks combined with convolutional attention mechanisms.
[0087] The network training and optimization unit is used to train and optimize the rail damage identification model based on the dataset of rail damage samples to obtain the trained rail damage identification model.
[0088] The model evaluation unit is used to perform fine-grained identification of rail damage B-images based on the trained rail damage identification model, and to evaluate the identification results.
[0089] It should be noted that the above modules and units can be implemented as a computer program, which can run on a computer device. The computer device's memory can store the computer program that makes up the modules and units, enabling the processor to execute the various steps of the above method.
[0090] It should be understood that although the steps in the flowcharts of the embodiments of the present invention are shown sequentially according to the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless explicitly stated herein, there is no strict order restriction on the execution of these steps, and they can be executed in other orders. Moreover, at least some steps in each embodiment may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily completed at the same time, but can be executed at different times. The execution order of these sub-steps or stages is not necessarily sequential, but can be performed alternately or in turn with other steps or at least a portion of the sub-steps or stages of other steps.
[0091] Those skilled in the art will understand that all or part of the processes in the above embodiments can be implemented by a computer program instructing related hardware. The program can be stored in a non-volatile computer-readable storage medium, and when executed, it can include the processes of the embodiments of the above methods.
[0092] The above embodiments merely illustrate several implementation methods of the present invention, and their descriptions are relatively specific and detailed, but they should not be construed as limiting the scope of the present invention. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of the present invention, and these all fall within the protection scope of the present invention. Therefore, the protection scope of this patent should be determined by the appended claims.
Claims
1. A fine-grained method for identifying rail damage in B-mode images based on convolutional networks, characterized in that, Includes the following steps: Dataset establishment: Collect B-images of rails based on ultrasonic flaw detection to construct a dataset of rail damage samples; Network Model Construction: A rail damage identification model was constructed based on convolutional neural networks combined with convolutional attention mechanisms; Network training and optimization: Based on the dataset of rail damage samples, the rail damage identification model is trained and optimized to obtain a trained rail damage identification model; Model evaluation: Fine-grained identification of rail damage in B-images is performed based on the trained rail damage identification model, and the identification results are evaluated.
2. The fine-grained recognition method for rail damage B-mode images based on convolutional networks according to claim 1, characterized in that, The B-image includes an abnormal reflection wave image and a fixed reflection wave image.
3. The fine-grained recognition method for rail damage B-mode images based on convolutional networks according to claim 2, characterized in that, The rail damage identification model includes: Feature extraction network: It consists of multiple convolutional modules, each of which contains convolutional layers, activation layers and pooling layers, used to extract multi-scale features from the input B-format image and output a feature map; Attention Enhancement Module: Corresponding to the convolution module, it is used to enhance the feature maps output by the corresponding convolution module with fine-grained features in both channels and space, and output an enhanced feature map; Classification and recognition network: used to predict the type of rail damage based on the enhanced feature map output by the attention enhancement module.
4. The fine-grained recognition method for rail damage B-mode images based on convolutional networks according to claim 3, characterized in that, The attention enhancement module includes a channel attention module and a spatial attention module. For a given input feature map, the attention enhancement module infers the attention map sequentially along the two separate dimensions of channel and space, and then multiplies the attention map with the input feature map to perform adaptive feature refinement.
5. The fine-grained recognition method for rail damage B-mode images based on convolutional networks according to claim 4, characterized in that, The implementation method of the channel attention module includes the following steps: For the input C-dimensional feature map F, max pooling and average pooling are performed to aggregate spatial information, resulting in two C-dimensional pooled features F. max and F avg Where C represents the number of channels in the feature map after convolution; F max and F avg The data is fed into a multilayer perceptron containing one hidden layer, resulting in two C×1×1 dimensional channel attention maps; the number of neurons in the hidden layer is... This refers to the compression ratio; The corresponding elements of the two channel attention maps obtained from the multilayer perceptron are added together, and then activated by the sigmoid function to obtain the final channel attention map M. c .
6. The fine-grained recognition method for rail damage B-mode images based on convolutional networks according to claim 5, characterized in that, The calculation process of the channel attention module is as follows: ; ; Where σ represents the Sigmoid function, which normalizes the values of the channel attention map to the range [0, 1]; MLP represents a multilayer perceptron, consisting of two fully connected layers, forming the hidden layer and the output layer respectively, activated by the ReLU function in between; F is the input feature map; F avg and F max These represent the pooling features output after average pooling (AvgPool) and max pooling (MaxPool), respectively. and These are the parameters that the module needs to learn, namely the weights of the hidden and output layers of the multilayer perceptron. The output features are the result of integrating the input feature maps with the channel attention module. It represents the Hadamardi (or Hadama) stack.
7. The fine-grained recognition method for rail damage B-mode images based on convolutional networks according to claim 6, characterized in that, The implementation method of the spatial attention module includes the following steps: For the output features of the channel attention module First, max pooling (MaxPool) and average pooling (AvgPool) are performed along the channel direction to obtain two two-dimensional pooling features. and ; Two pooling features and The features are stitched together along the channel dimension to obtain the stitched features; The concatenated features are convolved using a 7×7 kernel and then activated with a sigmoid function to generate a spatial attention map M. s .
8. The fine-grained recognition method for rail damage B-mode images based on convolutional networks according to claim 6, characterized in that, The calculation process of the spatial attention module is as follows: ; ; in, This is a convolution with a kernel size of 7×7; It is the output feature after integrating the output features of the spatial attention module and the channel attention module.
9. The fine-grained recognition method for rail damage B-mode images based on convolutional networks according to claim 1, characterized in that, The network training and optimization methods specifically include: using a five-fold cross-validation method, dividing the dataset into five equal parts, selecting four parts in sequence as the training set, and using the remaining part as the test set, taking the average accuracy of the five tests as the final result, and using it as the model performance index; the cross-entropy loss used during training is the cross-entropy loss, and the optimizer uses adaptive moment estimation.
10. A fine-grained recognition system for rail damage B-mode images based on convolutional networks, used to implement the fine-grained recognition method for rail damage B-mode images based on convolutional networks as described in any one of claims 1-9, characterized in that, include: The dataset building unit is used to collect B-images of rails based on ultrasonic flaw detection and to build a dataset of rail damage samples. The network model building unit is used to build a rail damage identification model based on convolutional neural networks combined with convolutional attention mechanisms. The network training and optimization unit is used to train and optimize the rail damage identification model based on the dataset of rail damage samples to obtain the trained rail damage identification model. The model evaluation unit is used to perform fine-grained identification of rail damage B-images based on the trained rail damage identification model, and to evaluate the identification results.