A crack image classification and discrimination method using a cross fusion neural network model
By combining a cross-fusion neural network model with discrete wavelet transform and channel attention module, the problem of drone images being susceptible to blurring and shadows was solved, enabling efficient classification and discrimination of low-quality images and improving the accuracy and efficiency of concrete crack monitoring.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- CHONGQING UNIV
- Filing Date
- 2024-06-18
- Publication Date
- 2026-06-19
AI Technical Summary
Existing technologies are inefficient and costly in concrete crack identification and monitoring. Furthermore, crack images acquired by drones are susceptible to motion blur and shadow occlusion, resulting in poor image classification accuracy, especially inadequate adaptability to low-quality images.
The Cross-Fused Neural Network (CFNet) model is adopted, which combines Discrete Wavelet Transform (DWT) and Cross-Fused Neural Network (CFNet Neural Network). By interactively adding two feature extraction paths and combining them with the Channel Attention (SE) module, the ability to mine image detail features is improved, and accurate image classification results are output.
It improves the accuracy of image classification and discrimination, is applicable to the classification and discrimination of low-quality images, and enhances the efficiency of construction engineering monitoring.
Smart Images

Figure CN118799627B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of image processing technology, specifically relating to a model and crack image classification and discrimination method based on Discrete Wavelet Transform (DWT) and Cross-Fused Neural Network (CFNet). Background Technology
[0002] Due to the low efficiency and high cost of manual identification and monitoring of concrete cracks, automated monitoring using computer vision and UAV technology has become an important method in the field of structural health monitoring. However, UAVs may encounter motion blur and shadow occlusion when acquiring crack images, which affects the extraction of crack contours and the estimation of crack parameters. Therefore, it is necessary to accurately identify blurred images, shadowed images, and normally clear images to select low-quality images.
[0003] The use of deep learning algorithms to automatically extract image features to solve image classification problems has become a common approach. Convolutional Neural Networks (CNNs), as a frequently used algorithm in image classification, have demonstrated superior performance in this field. In CNNs, convolutional and pooling layers are used to extract global and local feature information of the image, while fully connected layers are used to output the classification result. Researchers have improved model performance by modifying the neural network structure, such as increasing the number of network layers, adding attention mechanisms, and employing feature pyramid methods. Different network structures have yielded different image classification results. To improve the accuracy of image classification, those skilled in the art often increase the depth of the convolutional neural network. In the technical approach of constructing a suitable image classification model using image processing modules for classification and discrimination, the literature "A Diesel Engine Fault Diagnosis Method Based on Wavelet Time-Frequency Map and Swin Transformer" (Liu Zichang et al., Systems Engineering and Electronics Technology, 2023, pp. 2986-2998) describes a method that effectively combines the advantages of wavelet time-frequency analysis in processing nonlinear and non-stationary signals with the powerful image classification capabilities of Swin Transformer. It represents the original signal as a wavelet time-frequency map through continuous wavelet transform, and uses this wavelet time-frequency map as a feature map to train Swin Transformer, thereby achieving diesel engine fault state identification. However, this method does not consider combining discrete wavelet transform algorithms to construct a new, adapted neural network, which will affect the accuracy of subsequent image classification results.
[0004] To fully utilize image information, a reasonable deep neural network needs to be reconstructed in conjunction with the discrete wavelet transform algorithm to mine detailed image features. Furthermore, existing image classification methods are rarely applied to low-quality crack images and have poor adaptability for classifying and recognizing blurry crack images and shadowed crack images. Summary of the Invention
[0005] To address the problems existing in the prior art, the technical problem this invention aims to solve is to provide a cross-fusion neural network model that can accurately distinguish between blurred images, shadowed images, and normal clear images, thereby improving the accuracy of image classification and the monitoring efficiency of construction projects. This invention also provides a crack image classification method using this model.
[0006] To solve the above-mentioned technical problems, the technical solution of the present invention is as follows:
[0007] The present invention provides a cross-fusion neural network model (CFNet model for short), which includes discrete wavelet transform (DWT) and cross-fusion neural network (CFNet neural network for short). Discrete wavelet transform (DWT) extracts low-frequency components of the image, and the original image and the low-frequency components of the image are used as feature information input into the CFNet neural network.
[0008] The CFNet neural network includes two feature extraction paths corresponding to the original image and the low-frequency component image of the DWT, an SE module, convolutional layers, and fully connected layers. Each feature extraction path has three levels of convolutional units from left to right. The feature information output by the two first-level convolutional units of the upper and lower feature extraction paths is added together and input to the corresponding two second-level convolutional units. The feature information output by the two second-level convolutional units is added together and input to the corresponding two third-level convolutional units. The feature information output by the two third-level convolutional units is added together and input to the channel attention module SE. The SE module is then connected to two convolutional layers, which pass the features output by the convolutional layers to the fully connected layers. The three fully connected layers obtain the three categories of image classification.
[0009] The information processing process of the cross-fusion neural network model of this invention is as follows: Low-frequency information of the image is extracted using DWT (Digital Wavelength Wrapper). The low-frequency components of the image decomposed by DWT are used as feature information, and together with the original image, they serve as input to the neural network. A CFNet neural network containing two feature extraction paths is then used, which features simple structure and high accuracy. The two feature extraction paths correspond to the low-frequency components of the image and the original image, respectively. A feature interaction and fusion method is adopted between the two paths to fully utilize the feature information. A channel attention module (SE) is added to enhance the CFNet model's ability to recognize key feature information. Finally, the required image classification result is output through a fully connected layer.
[0010] The present invention also provides a crack image classification and discrimination method using the model, comprising the following steps:
[0011] Step 1: Collect and construct a crack image dataset
[0012] In a crack image dataset containing normal crack images, blurred crack images, and crack shadow images, a certain number of images are randomly selected as the training set and the test set.
[0013] Step 2: Train the CFNet neural network and obtain the neural network weights.
[0014] Images are fed into the CFNet model in batches to obtain the corresponding image classification results. The total loss value of the CFNet neural network for that batch is calculated using the loss function and the labeled images. The weights of the neural network are updated using the gradient backpropagation method. The process continues until all images have been calculated, and then the next iteration cycle begins. This process is repeated until the iteration cycle meets the requirements. All levels, modules, and layers in the neural network are trained synchronously. The network weights are saved as a weight file after each training session.
[0015] Step 3: Input the crack image into the CFNet model and perform image classification, then output the image classification results.
[0016] The technical effects of this invention are:
[0017] Because this invention uses an image processing module to construct a cross-fusion neural network model, it can mine image detail features for image classification and discrimination, thereby improving the accuracy of image classification and discrimination. It is suitable for classifying and discriminating low-quality images with cracks. Attached Figure Description
[0018] The accompanying drawings of this invention are described below:
[0019] Figure 1 This is a structural diagram of the cross-fusion neural network model of the present invention;
[0020] Figure 2 This is a flowchart of the crack image classification and discrimination process of the present invention. Detailed Implementation
[0021] The present invention will be further described below with reference to the accompanying drawings and embodiments:
[0022] To clearly describe the invention, this application uses the directional terms "upper" and "lower" for distinction. The terms "upper" and "lower" are determined based on the arrangement of the above drawings. When the actual use direction of the invention changes, the name of the orientation will change accordingly, and this should not be regarded as a limitation on the scope of patent protection.
[0023] In this application, the technical term "channel attention module (SE)" refers to the channel attention module described in the literature "Squeeze-and-excitation networks" by Wang X, Yu K, Wu S, et al., Proceedings of the IEEE conference on computer vision and pattern recognition, 2018. The SE module can be used to enhance the expressive power of the network.
[0024] like Figure 1 As shown, the present invention provides a cross-fusion neural network model, including Discrete Wavelet Transform (DWT) and CFNet neural network. The Discrete Wavelet Transform (DWT) extracts the low-frequency components of the image, and the original image and the low-frequency components of the image are used as feature information input into the CFNet neural network.
[0025] DWT performs multi-stage low-pass and high-pass filtering on the image signal to obtain low-frequency and high-frequency image components. According to the literature "Secure compressive sensing of images based on combined chaotic DWTsparse basis and chaotic DCT measurement matrix" (Wang Z, Hussein ZS, Wang X, et al., Optics and Lasers in Engineering, 2020), for a discrete signal x[n] of length N, the formula for calculating the low-frequency components of the DWT image is as follows:
[0026] (1)
[0027] In equation (1), A j (k) represents the j-th low-frequency coefficient, k represents the index corresponding to the coefficient, j represents the level of the transform, Ψ represents the wavelet basis function, φ represents the scaling function, · represents the convolution operation, n represents the discrete signal ordinal number, and N represents the length of the discrete signal.
[0028] The CFNet neural network includes two feature extraction paths corresponding to the original image and the low-frequency component image of DWT, an SE module, convolutional layers, and fully connected layers. Each feature extraction path has three levels of convolutional units from left to right. Each level of convolutional unit contains two convolutional modules and one pooling module. The pooling module is used to reduce the image resolution and obtain global information of the image. In order to improve the generalization ability of the model, a regularization normalization layer and a ReLU activation layer are connected after each convolutional module.
[0029] The feature information outputs of the two first-level convolutional units along the upper and lower feature extraction paths are interactively added and input into two corresponding second-level convolutional units. The feature information of the two second-level convolutional units is then interactively added and input into two corresponding third-level convolutional units. Through the interactive addition and fusion of information from the upper and lower feature extraction paths, the physical feature information extracted by DWT is fully utilized. The feature information outputs of the two third-level convolutional units corresponding to the upper and lower feature extraction paths are added and input into the SE module to extract key image information.
[0030] The SE module enables the CFNet neural network to identify important features while suppressing unimportant features. The SE module contains 10 SE modules. Each SE module is followed by two convolutional layers, and the features output by the convolutional layers are passed to the fully connected layers (FC) for computation. The number of output channels of the three fully connected layers are set to 4096, 4096 and 3 respectively based on empirical values. The output size of the last fully connected layer is 3, which corresponds to the three categories to be classified, indicating that the obtained image classification results include three categories: blurred image, normal image and shadow image.
[0031] like Figure 2 As shown, the crack image classification and discrimination process of the present invention is as follows:
[0032] Step 1: Collect and construct a dataset of cracked and blurred images
[0033] We collected normal and shadow images of cracks taken by drones. A certain number of normal crack images were randomly selected and blurred using motion blur convolution. This created a crack image dataset containing three categories: normal crack images, blurred crack images, and shadow crack images, with a ratio of 1:1:1. A certain number of images were randomly selected as the training and testing sets, with a ratio of 8:2.
[0034] Step 2: Train the CFNet neural network and obtain the neural network weights.
[0035] Images are fed into the CFNet model in batches to obtain the corresponding image classification results. The total loss value of the CFNet neural network for that batch is calculated using the loss function and the labeled images. The weights of the neural network are updated using the gradient backpropagation method. The process continues until all images have been calculated, and then the next iteration cycle begins. This process is repeated until the iteration cycle meets the requirements. All levels, modules, and layers in the neural network are trained synchronously. The network weights are saved as a weight file after each training session. Training ends when the predetermined number of iterations or accuracy requirements are met.
[0036] Convolutional layers are used to extract shallow and deep features from the image, with strides of 1 and 2 for both. The specific changes in image size before and after the convolution operation are shown in the following formula:
[0037] (2)
[0038] In equation (2), H1 is the size of the input feature map, H2 is the size of the convolution kernel, H3 is the size of the output feature map, P is the number of padding pixels, and S is the stride of the convolution kernel.
[0039] The feature data of the image is normalized using a batch normalization layer, and the specific formula used is as follows:
[0040] (3)
[0041] (4)
[0042] (5)
[0043] (6)
[0044] In equations (3)-(6), x i Given the input image feature map, y i The output image feature map is given by m, where m is the number of input feature channels, γ and β are additional variables involved in weight updates, and μ is the input feature map. B and σ B These are the mean and standard deviation, respectively. It is a very small constant to prevent the denominator from being 0.
[0045] The ReLU activation layer is used to perform non-linear processing on the feature layer, and the formula used is as follows:
[0046] (7)
[0047] In equation (7), x i is the input image feature map, 0 is the lower limit of the required value, and ƒ(x) is the activated image feature map.
[0048] The CFNet neural network uses cross-entropy loss as the training loss function. The formula for calculating cross-entropy loss is:
[0049] (8)
[0050] In equation (8), It is the prediction result of the neural network, y c This represents the corresponding label value, Loss is the training loss value, and C is the number of categories to be classified.
[0051] Step 3: Input the crack image into the CFNet model and perform image classification, then output the image classification result.
[0052] Input a dataset of crack images, train a CFNet neural network, and record the weight parameters for each training stage. After training, load the optimal neural network weight file, input the crack image test set into the CFNet model for result evaluation, use the CFNet model to classify the images, and finally output the image classification results.
[0053] Example
[0054] The crack image dataset collected in this embodiment contains a total of 3000 images, with the ratio of normal images, blurred images, and images with shadow occlusion being 1:1:1. 2400 images were randomly selected from the crack image dataset as the training set, 3000 images as the validation set, and 3000 images as the test set. The training and validation sets were used to adjust model parameters and find the optimal model structure, while the test set was used to evaluate the model's performance.
[0055] The CFNet neural network is trained using the training set. In this embodiment, the Adam algorithm is used to optimize the parameters of the CFNet neural network. The initial learning rate is set to η = 0.0001, the batch size is 16, and the training lasts for 100 epochs.
[0056] The AlexNet, GoogleNet, VGG16, and CFNet models of this invention were selected to perform image classification prediction on the test set. The accuracy (A), precision (P), recall (R), and F1 score of each neural network were calculated respectively.
[0057] The final prediction results of crack image classification include four categories: true positive (TP), false positive (FP), false negative (FN) and true negative (TN), as shown in Table 1.
[0058] Table 1 Sample Prediction Results
[0059] Labels are positive samples Labels are negative samples Predicted as a positive sample TP FP Predicted as a negative sample FN TN
[0060] Accuracy (A) is the proportion of correctly classified samples out of the total sample size, and is calculated using the following formula:
[0061] (11)
[0062] Precision (P) is the proportion of TP to the number of samples with positive predictions, and is calculated using the following formula:
[0063] (12)
[0064] Recall (R) is the proportion of total recall (TP) to the actual number of positive samples, and is calculated using the following formula:
[0065] (13)
[0066] F1-Score(F1) is the harmonic mean of P and R, and is calculated using the following formula:
[0067] (14)
[0068] The calculation results are shown in Table 2:
[0069] Table 2 Comparison of Evaluation Indicators
[0070] AlexNet GoogleNet VGG16 CFNet_10SE A 0.9333 0.9538 0.9513 0.9588 P 0.9354 0.9548 0.9521 0.9603 R 0.9333 0.9538 0.9513 0.9588 F1 0.9332 0.9539 0.9513 0.9587
[0071] As shown in Table 2, compared with models AlexNet, GoogleNet, and VGG16, the CFNet model of this invention achieves the highest scores across all evaluation metrics. This indicates that the CFNet model outperforms other models in classifying low-quality images, with scores of 0.9588, 0.9603, 0.9588, and 0.9587 for each metric. In contrast, the AlexNet model's scores are significantly lower than other models, indicating that the AlexNet model performs poorly in classifying shadowed, blurred, and normal images.
Claims
1. A crack image classification and discrimination method using a cross-fusion neural network model, characterized in that, The cross-fusion neural network model includes Discrete Wavelet Transform (DWT) and CFNet neural network. Discrete Wavelet Transform (DWT) extracts low-frequency components of the image, and the original image and the low-frequency components of the image are used as feature information input into the CFNet neural network. The CFNet neural network includes two feature extraction paths corresponding to the original image and the low-frequency component image of DWT, an SE module, convolutional layers, and fully connected layers. Each feature extraction path has three levels of convolutional units from left to right. The feature information output by the two first-level convolutional units of the upper and lower feature extraction paths is added together and input to the corresponding two second-level convolutional units. The feature information output by the two second-level convolutional units is added together and input to the corresponding two third-level convolutional units. The feature information output by the two third-level convolutional units is added together and input to the channel attention module SE. The SE module is then connected to two convolutional layers, which pass the features output by the convolutional layers to the fully connected layers. The three fully connected layers obtain the three categories of image classification. The convolutional unit comprises two convolutional modules and one pooling module. Each convolutional module is followed by a regularization normalization layer and a ReLU activation layer. The convolutional stride of each convolutional module is 1 and 2. The image size change before and after the convolution operation is as follows: (1) In equation (1), H1 is the size of the input feature map, H2 is the size of the convolution kernel, H3 is the size of the output feature map, P is the number of padding pixels, and S is the stride of the convolution kernel. The crack image classification and discrimination method includes the following steps: Step 1: Collect and construct a crack image dataset In a crack image dataset containing normal crack images, blurred crack images, and crack shadow images, a certain number of images are randomly selected as the training set and the test set. Step 2: Train the CFNet neural network and obtain the neural network weights. Images are fed into the CFNet model in batches to obtain the corresponding image classification results. The total loss value of the CFNet neural network for that batch is calculated using the loss function and the labeled images. The weights of the neural network are updated using the gradient backpropagation method. This process continues until all images have been processed, at which point the next iteration cycle begins. This update and calculation is repeated until the iteration cycle meets the requirements. Each level, module, and layer in the neural network is trained synchronously. The network weights are saved as a weight file after each training iteration. Step 3: Input the crack image into the CFNet model and perform image classification, then output the image classification results.
2. The crack image classification and discrimination method according to claim 1, characterized in that: The normalization layer normalizes the feature data of the image using the following formula: (2) (3) (4) (5) In equations (2)-(5), x i Given the input image feature map, y i The output image feature map is given by m, where m is the number of input feature channels, γ and β are additional variables involved in weight updates, and μ is the input feature map. B and σ B These are the mean and standard deviation, respectively. It is a very small constant; The ReLU activation layer performs non-linear processing on the feature layer, using the following formula: (6) In equation (6), x i is the input image feature map, 0 is the lower limit of the required value, and ƒ(x) is the activated image feature map.
3. The crack image classification and discrimination method according to claim 2, characterized in that: in In step 2, cross-entropy loss is used as the training loss function for the CFNet neural network. The formula for calculating cross-entropy loss is: (7) In equation (7), It is the prediction result of the neural network, y c This represents the corresponding label value, Loss is the training loss value, and C is the number of categories to be classified.
4. The crack image classification and discrimination method according to any one of claims 1 to 3, characterized in that: The number of output channels for the three fully connected layers were set to 4096, 4096 and 3, respectively.