A real-time road surface disease recognition method based on visible light and infrared image multi-modal fusion
By using a multimodal fusion method of visible light and infrared images, combined with feature decomposition and neural network models, the problems of accuracy and real-time performance in road defect detection under complex scenarios are solved, and efficient road defect identification is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- HARBIN INST OF TECH
- Filing Date
- 2025-02-25
- Publication Date
- 2026-06-26
AI Technical Summary
Existing technologies struggle to achieve simultaneous acquisition and feature fusion of visible light and infrared images in complex scenarios, resulting in low accuracy and poor stability in road defect detection. Furthermore, existing defect detection models suffer from inconsistent sample image features, a lack of image data, and low accuracy.
A multimodal fusion method based on visible light and infrared images is adopted. The basic layer and detail layer are extracted through feature decomposition, and image registration and fusion are performed. Combined with a neural network model, real-time identification of road surface defects is achieved.
It improves the accuracy and real-time performance of road surface defect detection in complex scenarios, solves the problem of poor multimodal image registration, and achieves high-precision road defect detection.
Smart Images

Figure CN120147709B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to a method for real-time detection of road surface defects through multimodal image fusion. Background Technology
[0002] Road surface defects directly impact driving safety and road service life. Therefore, accurate road surface defect detection provides reliable and effective technical and data support for road maintenance and management decisions, and is a crucial aspect of intelligent road management. In recent years, automated road surface data acquisition equipment has developed rapidly, and image processing-based road surface defect detection technology has gradually become a mainstream research direction. Road surface defect detection is also transitioning from traditional manual inspection to semi-automation and non-destructive fully automated methods. However, due to the complexity of road surface texture and the external environment, traditional image analysis methods struggle to consistently achieve good detection results under various complex road surface conditions.
[0003] While images acquired using digital imaging technology offer advantages such as intuitiveness and rich feature content, this image acquisition and analysis technique is sensitive to shadows, oil stains, uneven lighting, nighttime conditions, and other complex interference factors. It also suffers from numerous false positives and missed detections, making it difficult to meet the high-precision, multi-layered, and all-weather requirements of modern road inspection. Therefore, some researchers are considering introducing infrared imaging technology into the field of road engineering. It can operate without external light sources and is sensitive to temperature differences, giving it a unique advantage in detecting road damage (especially under extremely low light conditions and varying temperature ranges).
[0004] While infrared imaging technology offers unique advantages in extreme scenarios, it also has certain limitations. Infrared sensors capture the thermal radiation information of objects, making them somewhat adaptable to interference from external environmental factors. However, infrared image-based detection algorithms suffer from limitations such as insufficient texture detail and difficulty in identifying distant targets. Visible light sensors, on the other hand, collect reflected light from object surfaces, producing images rich in texture detail. However, they also struggle to handle interference from shadows, smoke, and other factors in complex ground environments. To overcome these limitations, a system based on multimodal fusion of infrared and visible light images has been proposed. This system integrates complementary information from the acquired source images to generate a high-contrast fused image that highlights prominent targets while containing rich texture detail.
[0005] However, traditional methods cannot achieve simultaneous acquisition and feature fusion of visible light and infrared images, requiring complex algorithms to ensure accurate registration and rapid fusion between multimodal images. Moreover, most existing disease detection models suffer from problems such as inconsistent sample image features, scarce image data, low accuracy, and poor stability, failing to adequately meet the needs of modern road inspection and maintenance. Summary of the Invention
[0006] The purpose of this invention is to solve the problems of difficulty in detecting road defects and low efficiency of image deep fusion in complex scenarios.
[0007] A real-time road surface defect identification method based on multimodal fusion of visible light and infrared images includes the following steps:
[0008] For visible light image I vi Feature decomposition and extraction are performed to obtain the corresponding base layer B. vi and detail layer D vi For infrared image I iR Feature decomposition and extraction are performed to obtain the corresponding base layer B. iR and detail layer D iR ;
[0009] The formula for feature decomposition and extraction is as follows:
[0010] B x =WLSR(I x ), D x =I x -B x
[0011] Where WLSR(·) is the edge filtering process; B x D is the base layer for feature decomposition and extraction. x For the detail layer extracted by feature decomposition, x = vi or iR, where vi and iR represent the visible light image and the infrared image, respectively, and I x Representing visible light image I vi Or infrared image I iR ;
[0012] Base layer B based on visible light images vi and the base layer B of infrared images iR Obtain fusion information F B Meanwhile, based on the detail layer D of the visible light image vi and the detail layer D of infrared images iR Obtain fusion information F D ; and then based on F B and F D The fused image F = F B +F D ;
[0013] Based on the fused image F, a neural network model is used to achieve real-time identification of road surface defects.
[0014] Furthermore, before identification, image registration of the visible light image and the infrared image is required, including the following steps:
[0015] First, using the visible light image as the reference image, the infrared image is transformed into the visible light image reference field; then, the image coordinates of the infrared image transformed into the visible light image reference field are dynamically registered with the visible light image pixels.
[0016] Furthermore, the process of transforming an infrared image into a visible light image reference field, using a visible light image as the reference image, includes the following steps:
[0017] Using the visible light image as the reference image, the image coordinates in the reference field of the visible light image are obtained by transforming the infrared image into the visible light image using the homography mapping transformation formula described below.
[0018]
[0019] Where, x iR y iR This represents the coordinates of the source pixel from the infrared image. K represents the image coordinates in the reference field for transforming an infrared image into a visible light image; vi K is the intrinsic matrix obtained using a visible light image as the reference image; iR This is the intrinsic matrix obtained using an infrared image as a reference image; vi R iR This is the rotation matrix for the mapping transformation from the infrared image coordinate system to the visible light image coordinate system.
[0020] Furthermore, the mapping transformation between the image coordinates in the infrared image transformed into the visible light image reference field and the dynamic pixel registration of the visible light image is as follows:
[0021]
[0022] Among them, t x For horizontal displacement, t y For vertical displacement, x is the rotation angle, s is the scaling factor; vi y vi This represents the coordinates of the source pixel in a visible light image.
[0023] Furthermore, the edge filtering formula is as follows:
[0024] Y = WLSR(X) = (1 + λL X ) -1 X, where
[0025] Where X represents the input image before filtering, Y represents the edge filtering result; λ is the balance coefficient; A x A y This is a diagonal weight matrix, with diagonal elements μ. x,p μy,p μ x,p μ y,p M represents the weighting coefficients of the gradient in the x and y directions corresponding to the pixel at spatial location p in image X; x M y represents the discrete difference operator matrices corresponding to the x-direction difference and the y-direction difference.
[0026] Furthermore, the diagonal weight matrix A x A y Specifically as follows:
[0027] A x =diag(μ x,1 ,μ x,2 ,μ x,3 ,μ x,4 ,…,μ x,n )
[0028] A y =diag(μ y,1 ,μ y,2 ,μ y,3 ,μ y,4 ,…,μ y,n )
[0029] Where n represents the total number of pixels in the image.
[0030] Furthermore, the discrete difference operator matrix corresponding to the x-direction difference. The discrete difference operator matrix corresponding to the difference in the y-direction
[0031] Furthermore, based on the visible light image, the base layer B vi and the base layer B of infrared images iR Obtain fusion information F B The process includes:
[0032] Regarding B vi and B iR , use I k The intensity value of the k-th pixel is represented by the saliency intensity difference assignment formula. Obtain the saliency intensity value Λ of visible light and infrared images. vi ,Λ iR Where N is B x Number of pixels, B x B vi Or B iR ;
[0033] Then calculate the adaptive weights for fusion. This leads to the fusion information F of the base layer. B =ωB vi+(1-ω)B iR .
[0034] Furthermore, based on the detail layer D of the visible light image vi and the detail layer D of infrared images iR Obtain fusion information F D The process includes:
[0035] First, the detail layer D of the visible light image. vi Perform SVF multi-scale decomposition to obtain the enhanced detail layer D. E_vi :
[0036]
[0037] in, This represents the SVF filtering results at different scales. Indicates corresponding to The detail layer, α, β are the detail layer weight coefficients, D E_vi This represents an enhanced visible light image detail layer;
[0038] Detail layer D of infrared image iR The enhanced infrared image detail layer D is obtained using the same method. E_iR ;
[0039] This leads to the fusion information F of the detail layer. D =D E_vi +D E_iR .
[0040] Furthermore, the neural network model used in the real-time identification of road surface defects is the YOLOv8 network model.
[0041] Beneficial effects:
[0042] To address the poor multimodal image registration performance of existing methods, this invention proposes an image registration scheme using visible light and infrared images, which effectively improves multimodal image registration and thus enhances subsequent recognition performance. To address the challenges of road defect detection in complex scenarios and the low efficiency of deep image fusion, this invention overcomes the limitations of traditional single-sensor defect detection by establishing a multimodal fusion system using visible light and infrared images. A new algorithm is introduced to ensure accurate registration and rapid fusion between multimodal images, and a deep learning recognition network is used for training, thereby achieving real-time recognition of road surface defects. This invention improves the accuracy and real-time performance of road surface defect detection models in complex scenarios. Attached Figure Description
[0043] Figure 1A flowchart of a method for real-time identification of road surface defects based on multimodal fusion of visible light and infrared images;
[0044] Figure 2 This is a schematic diagram of the image registration structure of the present invention;
[0045] Figure 3 This is a schematic diagram of an image fusion network structure;
[0046] Figure 4 A schematic diagram of the network structure for intelligent and rapid image recognition;
[0047] Figure 5 This invention provides a visualization of the target detection effect of the visible light and infrared fused image. Detailed Implementation
[0048] To make the objectives, technical solutions, and advantages of this invention clearer, the invention is described below with reference to specific embodiments shown in the accompanying drawings. However, it should be understood that these descriptions are merely exemplary and not intended to limit the scope of the invention. Furthermore, descriptions of well-known structures and technologies are omitted in the following description to avoid unnecessarily obscuring the concept of the invention.
[0049] It should also be noted that, in order to avoid obscuring the invention with unnecessary details, only the structures and / or processing steps closely related to the solution according to the invention are shown in the accompanying drawings, while other details that are not closely related to the invention are omitted.
[0050] Specific implementation method one: Combining Figures 1-4 This implementation method is described below.
[0051] The method for real-time identification of road surface defects based on multimodal fusion of visible light and infrared images described in this embodiment includes the following steps:
[0052] Step 1: Preprocess the acquired visible light and infrared image data, including noise removal and contrast enhancement.
[0053] Infrared and visible light images of road surface defects were acquired using both vehicle-mounted and handheld methods. Visible light and infrared image data can be acquired using cameras and infrared thermal imagers respectively, or using an infrared thermal imager that includes both infrared and visible light lenses. This implementation uses the FOTRIC 340X+ cloud thermal imaging series, which incorporates both infrared and visible light lenses. Considering the significant impact of daily temperature variations on image temperature difference effects, data was acquired at different times of the day. The acquired data was then subjected to noise removal and contrast enhancement to further improve image quality and visual perception, thus improving feature differentiation and extraction for subsequent image registration.
[0054] Step 2: Perform image registration on the visible light image and infrared image processed in Step 1, including feature detection, feature matching, transform model estimation, optimization iteration, similarity measurement and inter-image registration.
[0055] The process of registering visible light and infrared images of the preprocessed data in step 1 is as follows: Figure 2 As shown, prior to this, the camera's intrinsic matrix K, radial distortion coefficients (k1, k2, k3), and tangential distortion coefficients (p1, p2) need to be calculated through calibration. Its intrinsic matrix K is determined by the focal length f. x f y And the principal point c, which is usually close to the center of the image. x c y Composition, namely:
[0056]
[0057] Simultaneously, it is also necessary to determine the camera's external parameters. Previous studies have mostly used infrared images as the reference, which is beneficial for nighttime detection. Considering the weak edge information of road surface defects, this invention uses visible light images as the reference image to obtain the external parameters of the mapping transformation from the infrared image coordinate system to the visible light image coordinate system. vi R iR , vi T iR , vi R iR , vi T iR These are the rotation matrix and the translation vector, respectively; and this mapping transformation process can be represented by a matrix. vi G iR This is used for representation. Because the topological structures between different coordinate systems are extremely complex, they are difficult to represent using a simple planar model. Therefore, a simplification of the homography mapping transformation is considered, namely:
[0058]
[0059] Where, x iR y iR This represents the coordinates of the source pixel from the infrared image. K represents the image coordinates in the reference field of the infrared image transformation to the visible light image; vi K is the intrinsic matrix obtained using a visible light image as the reference image; iR This is the intrinsic matrix obtained using an infrared image as the reference image.
[0060] The aforementioned process yields the image coordinates in the visible light image reference field after the infrared image transformation. The above is a coarse correction and registration that only fixes the pixels, which is difficult to apply well to the correction of dynamic pixels. Therefore, this invention considers introducing the RIFT algorithm for fine dynamic pixel registration based on the aforementioned processing, that is, the image coordinates in the infrared image transformed into the visible light image reference field are registered with the dynamic pixels of the visible light image. The two image transformations of this algorithm can be represented by a 3×3 matrix H. RIFT This matrix combines rigid transformation with scaling factors, which can, to some extent, explain the mapping transformation process of dynamic pixels.
[0061]
[0062] Among them, t x For horizontal displacement, t y For vertical displacement, x is the rotation angle, s is the scaling factor; vi y vi This represents the coordinates of the source pixel in a visible light image.
[0063] This invention addresses the challenges of difficult image registration and correction and low conversion efficiency by using preprocessing for coarse registration correction and feature-based matching to achieve rapid registration and robust alignment of infrared and visible light images.
[0064] Step 3: Perform rapid fusion processing on the registered infrared and visible light images from Step 2, including feature decomposition and extraction, feature fusion, and image reconstruction of the multimodal images to obtain the fused image;
[0065] The process of fusing the images registered in step 2 is as follows: Figure 3 As shown. When performing feature decomposition and extraction on multimodal images, this invention improves upon the least squares filter, focusing more on the edge gradient problem of cracks, and proposes an edge-preserving filter that decomposes the original image into a basal layer and a detail layer. Specifically, for the input image X, the goal is to make the filtered result Y as close as possible to the source image X, maintaining smoothness in regions with small gradients, and as close as possible to the source image in edge regions with strong gradients. This method innovatively transforms the filtering problem into a problem of finding the minimum value of the loss function f(X), which can be expressed as:
[0066]
[0067] Where p represents the spatial location of the pixel; X p Y p Let X be the pixel at spatial location p of the image and the filtered result Y; λ be the balance coefficient; μ be the pixel at spatial location p. x ,μ y μ represents the weighting coefficients of the gradient in the x and y directions. x,p μ y,pμ is the pixel corresponding to spatial position p in image X. x ,μ y ;l is the brightness channel of the input image X, α represents the sensitivity to gradient changes, and κ is a constant coefficient.
[0068] The above equation is transformed through a series of matrix transformations, and a diagonal weight matrix A is introduced. x A y and discrete difference operator matrix M x M y We can derive the matrix form of f′(Y), that is:
[0069] f′(Y)=2Y-2X+2λ(M x T A x M x +M y T A y M y Y = 0
[0070] f′(Y) is the derivative of f(Y);
[0071] Diagonal weight matrix A x A y The diagonal elements of the matrix correspond to the weight coefficient μ at each pixel position p. x,p μ y,p Specifically, it can be expressed as:
[0072] A x =diag(μ x,1 ,μ x,2 ,μ x,3 ,μ x,4 ,…,μ x,n )
[0073] A y =diag(μ y,1 ,μ y,2 ,μ y,3 ,μ y,4 ,…,μ y,n )
[0074] Where n represents the total number of pixels in the image;
[0075] Discrete difference operator matrix M x M y : Typically constructed as a diagonal matrix to represent the differences between adjacent pixels; M x M represents the difference in the x-direction. y The difference in the y-direction can be specifically represented as:
[0076]
[0077] Export is possible:
[0078] Y=(1+λL X ) -1 X = WLSR(X)
[0079]
[0080] Wherein, WLSR(·) is the edge filtering process;
[0081] For visible light images, denoted as I vi The infrared image after dynamic pixel registration is denoted as I. iR I was calculated separately. vi and I iR Base layer B x and detail layer D x :
[0082] B x =WLSR(I x ), D x =I x -B x
[0083] Where x = vi or iR, vi and iR represent the visible light image and the infrared image, respectively, i.e., I x Representing visible light image I vi Or infrared image I iR ;
[0084] After the above image decomposition, the base layer and detail layer need to be fused. This process needs to be done in two parts:
[0085] (i) In the process of fusion at the base layer, consider introducing a vector space model and the idea of adaptive weight allocation for the base layer B. x Corresponding image B vi B iR , use I k Let Λ represent the intensity value of the k-th pixel. Then, its saliency intensity difference distribution can be represented as Λ k ,Right now:
[0086]
[0087] Among them, Λ k It can be used to calculate the intensity value of each pixel, ranging from [0,1]; where N is the intensity of B. x Number of pixels.
[0088] The above formula can be used to calculate the saliency intensity value Λ of visible light and infrared images respectively. vi ,ΛiR At this point, the adaptive weight ω for fusion of the base layer can be calculated, and can be expressed as:
[0089]
[0090] At this point, the fusion information F of the base layer can be calculated. B :
[0091] F B =ωB vi +(1-ω)B iR
[0092] Among them, B vi B represents the base layer of a visible light image; iR The base layer representing an infrared image;
[0093] (ii) For detail layer fusion, a sub-window variance filter is first used to obtain local edge information of the image, and a spatial statistical model is used to further improve the edge perception capability of the filter. This filter needs to consider the edge statistical model and the sub-window variance property, and the output result can be represented as a linear combination of the source image and the image after smoothing and filtering.
[0094] For detail layer D x (i.e. D) vi D iR ), and perform SVF (sub-window variance filtering) multi-scale decomposition;
[0095] The SVF sub-window variance filtering process is as follows:
[0096]
[0097] Among them, I p Representing the detail layer D x The corresponding local pixel block centered on pixel p in the image, I′ p The local pixel block after sub-window variance filtering; φ p The contribution parameter is used to control the local pixel block I in the image. p in I' p The contribution of ω in the filter; F(·) is the smoothing filter; ω p k represents the support region of the filter centered at pixel p. i This represents the intensity value of the i-th pixel within the support domain.
[0098] For φ p The value of depends on the global variance and the sub-window variance of the filter being filtered. Assuming... Let be the global variance value to be filtered. This represents dividing the region to be filtered into four sub-windows, corresponding to the variance values. We can then set:
[0099]
[0100] in, ε is the regularization parameter.
[0101] As can be seen, the basic layer B is obtained above. x and detail layer D x The filtering method also follows the Laplace pyramid principle. Therefore, this invention targets D. vi By performing multi-scale decomposition based on SVF (sub-window variance filtering), we can obtain:
[0102]
[0103] in, These represent the fine-scale detail layer and the small-scale detail layer, respectively, with α and β representing the weight coefficients of the two detail layers. E_vi This represents an enhanced visible light image detail layer.
[0104] Regarding D iR Perform multi-scale decomposition of SVF to obtain D E_iR D E_iR This indicates an enhanced infrared image detail layer.
[0105] The decomposition scale of SVF multiscale decomposition can be determined based on the actual situation.
[0106] The final fused information is: F D =D E_vi +D E_iR .
[0107] Image fusion requires inverse transformation of the base layer and detail layer after fusion.
[0108] F = F B +F D
[0109] Step 4: Label the fused image F and divide it into training, validation and test sets;
[0110] During the annotation process, LabelImg software is used to annotate the images and generate corresponding txt files. These files contain information about the type, size, and location of the target of interest to ensure the accuracy and reliability of the dataset.
[0111] Step 5: Input the reconstructed image F corresponding to the dataset in Step 4 into the fast detection network model for road surface defects, perform iterative optimization training to obtain a pre-trained weight model, and import the model into a mobile device to realize real-time identification of road surface defects.
[0112] During the real-time identification process, visible light and infrared images of the road to be identified are acquired, and a reconstructed image F is obtained based on the visible light and infrared images. The road surface defects are then identified in real time using a rapid detection network model.
[0113] The training set obtained in step 4 is used to train and optimize the YOLOv8 model. Its training structure diagram is shown below. Figure 4 As shown, the input image is first processed through multiple convolutional layers, C2f, and SPPF modules for deep feature extraction, achieving efficient fusion of local and global features. Next, the extracted feature maps are passed to the Neck network. By constructing a multi-scale feature pyramid and enabling beneficial cross-layer information transfer, the network can better perceive the features of targets at different scales. The model uses anchor boxes on feature maps at multiple scales to predict bounding boxes, predicting the position offset, confidence level, and class probability of each anchor box. Finally, the network enters the detection network, where the extracted and optimized feature maps are superimposed to obtain a feature map with region proposals. Fully connected operations are used for target localization, thereby performing bounding box regression and classification regression for image lesions. The prediction results undergo post-processing, including threshold filtering and non-maximum suppression, to remove low-confidence predictions and merge overlapping boxes. Finally, the target's class, bounding box coordinates, and confidence level are output, ultimately obtaining accurate information about the detected underground target space.
[0114] During model training, this invention uses precision, recall, and mean average precision (mAP) as model evaluation metrics. To fully leverage the advantages of the YOLOv8 architecture, this invention continuously iterates the model structure and parameter configuration to achieve an optimal balance among these three metrics. The calculation formulas for precision (P), recall (R), and mean average precision (mAP) are as follows:
[0115] Precision P:
[0116]
[0117] Recall rate R:
[0118]
[0119] Average accuracy and average accuracy value mAP:
[0120]
[0121] Wherein, TP represents the number of positive examples of the target that are predicted as positive in the target detection task; FP represents the number of negative examples of the target that are predicted as positive in the target detection task; P(r) represents the curves corresponding to precision and recall; and N represents the number of classes in the multi-class target detection task.
[0122] Example:
[0123] Using an infrared thermal imager, approximately 10 km of urban and campus roads were inspected. The collected infrared data was preprocessed according to the steps described above, followed by model design. Data augmentation and image annotation were also required to create a dataset. Finally, model experiments and comparative analysis were conducted.
[0124] The model test results are shown in Table 1 below:
[0125] Table 1 Comparison of Iteration Evaluation Indicators for Each Model
[0126]
[0127] It can be seen that YOLO has demonstrated its unique advantages in target recognition and real-time detection. Although it has now evolved to YOLOv11, the results show that the latest detection results are not necessarily the best. It requires targeted training based on the specific detection scenario, the target of interest, and the region of interest. The results show that YOLOv8 exhibits superior performance in recognizing and detecting fused infrared and visible light images, surpassing other models in all metrics, and is well-suited for road damage detection in extreme scenarios.
[0128] The recognition effect of fused visible light and infrared images is as follows: Figure 5 As shown.
[0129] Based on extensive practical experience and professional knowledge in this field, this invention designs a real-time road surface defect identification method based on multimodal fusion of visible light and infrared images. This method can, to some extent, solve problems such as poor road damage detection performance, difficult data interpretation, large workload for target recognition and classification, and poor real-time performance under extreme low-light conditions. Compared with existing technologies, the model of this invention performs cross-scale registration and fusion of infrared and visible light multimodal image data, solving the problem of road surface defect detection in complex scenarios. Then, an intelligent recognition algorithm is used to train and evaluate the fused image, overcoming the drawbacks of conventional manual inspections, such as missed detections, false detections, and low efficiency. This lays the foundation for building a defect detection model for extreme scenarios, enabling its wider application in various portable road information acquisition devices and online defect detection platforms. This promotes the construction of a smart road performance online holographic precision interpretation and evaluation system with higher detection accuracy and lower computational costs.
[0130] The foregoing has shown and described the basic principles, main features, and advantages of the present invention. Those skilled in the art should understand that the present invention is not limited to the above embodiments. The embodiments and descriptions in the specification are merely illustrative of the principles of the invention. Various changes and modifications can be made to the invention without departing from its spirit and scope, and all such changes and modifications fall within the scope of the present invention as claimed. The scope of protection of this invention is defined by the appended claims and their equivalents.
Claims
1. A method for real-time identification of road surface defects based on multimodal fusion of visible light and infrared images, characterized in that: Includes the following steps: For visible light images Feature decomposition and extraction are performed to obtain the corresponding base layer. and detail layers For infrared images Feature decomposition and extraction are performed to obtain the corresponding base layer. and detail layers ; The formula for feature decomposition and extraction is as follows: , in, Edge filtering processing; This serves as the foundational layer for feature decomposition and extraction. For the detail layer extracted by feature decomposition, , These represent visible light images and infrared images, respectively. Representing visible light images or infrared image ; Base layer based on visible light images and the base layer of infrared images Obtain fusion information Meanwhile, based on the detail layer of the visible light image and the detail layer of infrared images Obtain fusion information ; and then based on and Obtain the fused image ; Based on the fused image F, a neural network model is used to achieve real-time identification of road surface defects; The edge filtering formula is as follows: ,in Where X represents the input image before filtering, and Y represents the edge filtering result; This is the balance coefficient; This is a diagonal weight matrix, with the diagonal elements being... , The spatial position of image X The pixel corresponding to Weighting coefficients for the directional gradient; Let be the discrete difference operator matrices corresponding to the x-direction difference and the y-direction difference; diagonal weight matrix The diagonal elements of the matrix correspond to the weight coefficients at each pixel position p. Specifically, it is expressed as: Where n represents the total number of pixels in the image; Base layer based on visible light images and the base layer of infrared images Obtain fusion information The process includes: against and ,use The intensity value of the k-th pixel is represented by the saliency intensity difference assignment formula. Obtain the saliency intensity values of visible light and infrared images. Where N is Medium pixel count, express or ; Then calculate the adaptive weights for fusion. This leads to the fusion information of the base layer. ; Details layer based on visible light images and the detail layer of infrared images Obtain fusion information The process includes: First, the detail layer of a visible light image. Perform SVF multi-scale decomposition to obtain enhanced detail layers. : in, This represents the SVF filtering results at different scales. - Indicates corresponding to - The detailed layer, These are the weight coefficients for the detail layer. This represents an enhanced visible light image detail layer; Detail layer of infrared image The same method is used to obtain the enhanced infrared image detail layer. ; This leads to the fusion information of the detail layer. ; For detail layers of visible light images The process of performing SVF multiscale decomposition is as follows: The SVF sub-window variance filtering process is as follows: in, Representing the detail layer The corresponding image is in pixels A local pixel block centered on the core. This represents a local pixel block after sub-window variance filtering. The contribution parameter is used to control local pixel blocks in an image. exist Contribution in; For smoothing filters; This represents the support domain of the filter centered at pixel p. This represents the intensity value of the i-th pixel within the support domain; against The value of depends on the global variance and the sub-window variance of the filter. Let be the global variance value to be filtered. Let the variance values corresponding to the four sub-windows representing the region to be filtered be: in, , This is the regularization parameter.
2. The method for real-time identification of road surface defects based on multimodal fusion of visible light and infrared images according to claim 1, characterized in that: Before identification, image registration of the visible light image and the infrared image is required, including the following steps: First, using the visible light image as the reference image, the infrared image is transformed into the visible light image reference field; then, the image coordinates of the infrared image transformed into the visible light image reference field are dynamically registered with the visible light image pixels.
3. The method for real-time identification of road surface defects based on multimodal fusion of visible light and infrared images according to claim 2, characterized in that: The process of transforming an infrared image into a visible light image reference field, using a visible light image as the reference image, includes the following steps: Using the visible light image as the reference image, the image coordinates in the reference field of the infrared image are obtained by applying the following homography mapping transformation formula. , : in, , This represents the coordinates of the source pixel from the infrared image. , The image coordinates in the reference field for transforming the infrared image to a visible light image; This is the intrinsic matrix obtained using a visible light image as the reference image; This is the intrinsic matrix obtained using an infrared image as a reference image; This is the rotation matrix for the mapping transformation from the infrared image coordinate system to the visible light image coordinate system.
4. The method for real-time identification of road surface defects based on multimodal fusion of visible light and infrared images according to claim 3, characterized in that: The mapping transformation between the image coordinates in the infrared image reference field and the dynamic pixel registration of the visible light image is as follows: in, For horizontal displacement, For vertical displacement, For rotation angle, This is the scaling factor; , This represents the coordinates of the source pixel in a visible light image.
5. The method for real-time identification of road surface defects based on multimodal fusion of visible light and infrared images according to claim 1, characterized in that: The discrete difference operator matrix corresponding to the x-direction difference. The discrete difference operator matrix corresponding to the difference in the y-direction .
6. The method for real-time identification of road surface defects based on multimodal fusion of visible light and infrared images according to claim 1, characterized in that: The neural network model used in the real-time identification of road surface defects is the YOLOv8 network model.