Road surface defect automatic detection method and system based on deep reinforcement learning

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By using deep reinforcement learning to fuse multi-source data and dynamically adjust the window, the efficiency and accuracy issues of road detection in complex environments are solved, and efficient and accurate identification of micro-cracks is achieved.

CN122262829APending Publication Date: 2026-06-23HENAN JIAO YUAN ENG TECH CO LTD

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: HENAN JIAO YUAN ENG TECH CO LTD
Filing Date: 2026-04-30
Publication Date: 2026-06-23

Application Information

Patent Timeline

30 Apr 2026

Application

23 Jun 2026

Publication

CN122262829A

IPC: G06F18/241; G06F18/25; G06F18/213; G06V10/82; G06V10/44; G06V10/54; G06V10/25; G06N3/092; G06N3/0464; G06F123/02

AI Tagging

Application Domain

Character and pattern recognition Biological models

Technology Topics

Feature vector Engineering

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

A satellite signal quality prediction model construction method, device and prediction method
CN117830828BCharacter and pattern recognition Biological models Feature vectorSignal quality
Method and apparatus for training a neural network model
CN122264023ASolve the problem of insufficient fitting abilityNeural learning methods Feature vector Algorithm
Industrial internet vulnerability verification method
CN122247740AHigh precision Improve efficiency Securing communication Feature vectorValidation methods
A laser beam energy distribution adaptive modeling method and system
CN122260819AEnsemble learning Adaptive controlBeam energyHeat flux
Edge network task flow cross-layer dynamic offloading method fusing semantics and computing power prediction
CN122247988ASemantic analysis TransmissionExecution planPathPing

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing road inspection technologies struggle to achieve efficient and accurate defect identification in complex environments, especially for subtle or hidden defects where the area of interest cannot be dynamically adjusted, resulting in low detection efficiency and insufficient accuracy.

Method used

A deep reinforcement learning-based approach is adopted to obtain road surface environment descriptions through multi-source data fusion. Combined with image processing and speed information, the detection window is dynamically adjusted, and a reward feedback mechanism is used to optimize defect localization and generate a continuous defect localization sequence.

Benefits of technology

It significantly improves the accuracy and robustness of microcrack detection, reduces resource consumption, and provides efficient and precise technical support for road maintenance.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122262829A_ABST

Patent Text Reader

Abstract

The present application relates to a road surface defect automatic detection method and system based on deep reinforcement learning, comprising: through multi-source data fusion technology, integrating original image data, vehicle speed information and historical defect records, extracting texture details and edge contours, combining dynamic environment influence analysis to form comprehensive environment feature representation; and through window adjustment mechanism and feature vector fusion, optimizing defect positioning accuracy, simultaneously using positive and negative reward feedback and environment adaptability correction to dynamically adjust detection strategy, finally outputting crack position and type.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of information technology, and in particular to an automated method and system for detecting road surface defects based on deep reinforcement learning. Background Technology

[0002] Road inspection technology, as a crucial component of infrastructure maintenance, plays an irreplaceable role in ensuring traffic safety and extending road lifespan. With rapid urbanization, road aging and defects are becoming increasingly prominent, making the timely detection and repair of road surface cracks, potholes, and other potential hazards an urgent need. However, existing technologies often fall short of meeting the requirements of efficiency and accuracy when dealing with complex real-world scenarios, making the research into more intelligent inspection methods particularly important.

[0003] Current technical solutions have revealed some deep-seated shortcomings in practical applications. Many methods are easily affected by changes in lighting, shadows, or road debris in complex environments, leading to unstable detection results. Furthermore, most of these solutions rely on independent analysis of each image, ignoring the continuity and dynamic changes in the detection process, making them unsuitable for scenarios involving high-speed movement or rapid environmental changes. These limitations often result in misjudgments or omissions in real-world applications, particularly in the poor identification of critical defects.

[0004] A deeper technical challenge lies in enabling the detection process to self-adjust. Traditional detection methods often mechanically scan the entire image, lacking targeted attention to key areas. This inflexibility leaves the system helpless when faced with subtle or hidden defects. Especially when defect characteristics are not obvious or are masked by other interference factors, the system cannot adjust its focus based on real-time feedback, nor can it accumulate experience and optimize its strategy through continuous detection. This lack of technical capability directly leads to low detection efficiency and insufficient accuracy.

[0005] Taking high-speed vehicle-mounted inspection as an example, when vehicles are traveling at high speeds, the quality of road surface images deteriorates due to vibration or blurring, and tiny cracks may be difficult to detect in a single image. If the system cannot dynamically adjust its focus on a particular area based on the detection results of the previous frames, or cannot decide whether to perform a more detailed scan of suspected problem areas, many potential hazards will be overlooked. Therefore, enabling the inspection system to autonomously adjust its area of focus in dynamic environments has become a key issue in improving road inspection effectiveness. Summary of the Invention

[0006] The purpose of this invention is to provide an automated method and system for detecting road surface defects based on deep reinforcement learning, in order to solve the technical problems of low road detection efficiency and insufficient accuracy caused by the complex road environment, fine cracks and large dynamic environmental interference, and the lack of dynamic adjustment of the area of interest in traditional detection.

[0007] The technical solution of the present invention is as follows:

[0008] This invention provides an automated method for detecting road surface defects based on deep reinforcement learning, which mainly includes:

[0009] The system acquires raw image data of the current road surface, vehicle speed information, and historical defect detection records. It then fuses the grayscale distribution characteristics, speed information, and historical records of the data to generate an initial road surface environment description.

[0010] Based on the initial road surface environment description, local texture details and edge contour information are extracted from the original image, and the dynamic environment influence is analyzed by combining speed information to obtain a comprehensive environmental feature representation;

[0011] Based on the comprehensive environmental features, the spatial correlation of pixels and the difference in regional contrast are analyzed. When the confidence score of the defect in the detection window is lower than the preset threshold, the window is translated or scaled according to the pre-established action selection rules to obtain the adjusted local image region.

[0012] New edge contour information and local texture details are extracted from the adjusted local image region, fused with the comprehensive environmental feature representation and filtered for environmental noise to obtain an updated road surface state description.

[0013] The deviation between the defect boundary and the actual location is calculated by evaluating the positioning accuracy. Resource consumption is analyzed by combining the resource occupancy ratio and resource optimization constraints to obtain positive and negative reward feedback values. If the reward feedback value is positive and exceeds the dynamic adjustment threshold, the current area is determined to be a suspected defect area.

[0014] Based on the positive and negative reward feedback values and the reward weight allocation rules, the parameters of the action selection rules are adjusted, and the reward feedback cycle is updated using environmental adaptability correction to obtain the optimized window adjustment mechanism.

[0015] The window adjustment mechanism is used to perform window translation or scaling operations on subsequent road surface image data. The rhythm of feature extraction and fusion is controlled by the state update frequency to obtain a continuous defect localization sequence.

[0016] The confidence score and edge contour information of the suspected defect area are extracted from the defect location sequence to determine the specific location and type of the crack, and the detection results of the micro-cracks on the road surface are output.

[0017] This invention provides an automated road surface defect detection system based on deep reinforcement learning, which mainly includes: a multi-source data acquisition and fusion module, used to acquire the original image data collected on the current road surface, vehicle speed information and historical defect detection records, and to initially integrate the grayscale distribution features of the image with speed information and historical records through multi-source data fusion technology to obtain an initial road surface environment description;

[0018] The feature extraction and environment analysis module is used to extract local texture details and edge contour information from the original image using image processing technology, based on the initial road surface environment description, and to analyze the impact of dynamic environment by combining speed information to obtain a comprehensive environmental feature representation.

[0019] The window adjustment module is used to analyze pixel spatial correlation and regional contrast differences based on the comprehensive environmental feature representation. When the defect confidence score of the detection window coverage area is lower than the preset threshold, the window translation or scaling operation is performed through the pre-established action selection rules to obtain the adjusted local image area.

[0020] The state update module is used to extract new edge contour information and local texture details from the adjusted local image region, merge them with the comprehensive environmental feature representation through feature vector fusion technology, and filter environmental noise to obtain the updated road surface state description.

[0021] The reward evaluation and judgment module is used to evaluate the deviation between the defect boundary and the actual location based on the updated road surface condition description, and to analyze and calculate resource consumption by combining the resource occupancy ratio and resource optimization constraints to obtain positive and negative reward feedback values. When the reward feedback value is positive and exceeds the dynamically adjusted threshold, the current area is determined to be a suspected defect area.

[0022] The learning optimization module is used to adjust the parameters of the action selection rule based on the positive and negative reward feedback values and the reward weight allocation rules, and to update the reward feedback cycle using environmental adaptability correction to obtain the optimized window adjustment mechanism.

[0023] The defect location sequence generation module is used to perform window translation or scaling operations on the subsequently acquired road surface image data using an optimized window adjustment mechanism. The rhythm of feature extraction and fusion is controlled by the state update frequency to obtain a continuous defect location sequence.

[0024] The defect identification and output module is used to extract the defect confidence score and edge contour information of suspected defect areas from a continuous defect location sequence, determine the specific location and type of the final crack, and output the detection results of micro-cracks on the road surface.

[0025] The beneficial effects of this application are as follows: An automated road surface defect detection method and system based on deep reinforcement learning aims to address the challenges of complex road environments, fine cracks, and significant dynamic environmental interference. Through multi-source data fusion technology, this invention integrates original image data, vehicle speed information, and historical defect records, extracting texture details and edge contours. Combined with dynamic environmental impact analysis, a comprehensive environmental feature representation is formed. Furthermore, through window adjustment mechanisms and feature vector fusion, defect localization accuracy is optimized. Simultaneously, positive and negative reward feedback and environmental adaptive correction are used to dynamically adjust the detection strategy, ultimately outputting the crack location and type. This invention, through adaptive window operation and continuous defect localization sequence generation, significantly improves the detection accuracy and robustness of fine cracks, reduces resource consumption, and provides efficient and accurate technical support for road maintenance. Attached Figure Description

[0026] Figure 1 This is a flowchart illustrating a specific embodiment of the automated road surface defect detection method based on deep reinforcement learning of the present invention.

[0027] Figure 2 This is a schematic diagram of the automated road surface defect detection method based on deep reinforcement learning according to the present invention;

[0028] Figure 3 This is a schematic diagram of the automated road surface defect detection system based on deep reinforcement learning according to the present invention. Detailed Implementation

[0029] To make the objectives, technical solutions, and advantages of this invention clearer, the invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only for explaining the invention and are not intended to limit the invention; that is, the described embodiments are merely some embodiments of the invention, and not all embodiments. The components of the embodiments of the invention described and shown in the accompanying drawings can generally be arranged and designed in various different configurations.

[0030] Therefore, the following detailed description of the embodiments of the invention provided in the accompanying drawings is not intended to limit the scope of the claimed invention, but merely to illustrate selected embodiments of the invention. All other embodiments obtained by those skilled in the art based on the embodiments of the invention without inventive effort are within the scope of protection of the invention.

[0031] It should be noted that relational terms such as "first" and "second" are used merely to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.

[0032] The features and performance of the present invention will be further described in detail below with reference to embodiments.

[0033] Specific embodiments of the automated road surface defect detection method and system based on deep reinforcement learning of the present invention:

[0034] like Figures 1-2 The automated road surface defect detection method based on deep reinforcement learning in this embodiment may specifically include:

[0035] S101: Acquire the raw image data of the current road surface, vehicle speed information, and historical defect detection records. By constructing a multi-source data foundation using the road surface acquisition equipment, the grayscale distribution characteristics of the images are initially integrated with speed information and historical records to obtain an initial description of the road surface environment.

[0036] Specifically, grayscale features are extracted from the original image data. These grayscale features include, but are not limited to, grayscale histograms, grayscale mean, grayscale variance, grayscale co-occurrence matrix statistics, or any combination thereof. Image processing techniques are used to standardize the grayscale features, resulting in standardized image feature data. The standardization method is one of the following: linear normalization, mean-variance standardization, nonlinear mapping, or a standardization method adaptively determined based on the environment. Combining driving speed information with a preset speed range division rule, if the speed information exceeds the preset range, the image feature data is dynamically corrected to determine the corrected feature dataset. The speed range division rule includes at least one stable acquisition range and at least one unstable acquisition range. If the current speed is within the stable acquisition range, the standardized image feature data is directly used as the output feature; if the current speed is within the unstable acquisition range, the dynamic correction process is triggered. The correction process employs one or a combination of the following methods: linear or nonlinear correction based on a pre-calibrated velocity-correction coefficient mapping relationship; obtaining the correction amount by looking up a table based on velocity information; mapping using a trained velocity-feature correction model; and adaptively calculating the correction intensity based on the degree to which the velocity deviates from the stable range.

[0037] In one possible implementation, the method for extracting grayscale features from the original image data is as follows: convert the acquired road surface image into a grayscale image, count the grayscale value (0–255) pixel by pixel, and generate a grayscale histogram as the original grayscale feature vector. ,in Indicates grayscale value The number of pixels. The min-max normalization method is used to... The data is standardized by mapping it to the [0,1] interval to obtain normalized image feature data. : In the formula, and These are the minimum and maximum frequency values of the original histogram, respectively. (Driving speed information) (Unit: km / h) Data is collected in real time by onboard sensors. Pre-defined speed range division rules, for example: Stable data collection range: Low-speed range: High-speed section: .like If it is within a stable acquisition range, then directly use As the final feature data. If the velocity information exceeds the stable acquisition range (i.e. or ), then for Dynamic correction is performed. Correction methods could include, for example, pre-calibrating a grayscale shift coefficient table at different speeds through experiments. , indicating speed Lower grayscale The corrected offset. Corrected features. The calculation is as follows: And it is amplitude-limited to keep it within the [0,1] range. This is the corrected feature dataset.

[0038] The corrected feature dataset is obtained and matched with historical defect detection records. If similar defect patterns exist in the historical records, the corresponding defect features are extracted to identify potential road surface problem areas. These defect features mainly include edge contour information (reflecting the boundary morphology of cracks or potholes), local area texture details (reflecting roughness or microstructural changes at the damaged area), and grayscale distribution features (reflecting the brightness difference between the defect area and the normal road surface).

[0039] Specifically, the system compares and matches the currently corrected feature dataset with historical defect detection records in the database. The matching process is based on feature vector similarity calculation. If the features of the current image (such as grayscale distribution and texture) highly overlap with patterns in the historical records, a similar defect pattern is identified. After further extracting the corresponding defect features from the historical records, these are used as prior knowledge and mapped onto the currently corrected feature dataset. By comparing the distribution of these features in the current image, the system initially identifies the specific coordinate ranges where cracks or potholes may exist, thereby determining potential road surface problem areas.

[0040] Based on potential road surface problem areas, and using the matching results of the corrected feature dataset and historical records, a Support Vector Machine (SVM) algorithm is employed to classify the road surface environment, resulting in a classified environmental state description. Specifically, the system uses the SVM algorithm to classify the road surface environment. SVM identifies the current environmental category by finding an optimal hyperplane that partitions multi-source data integrating image features, speed information, and historical records. The environmental state description is generated by matching a state identifier with a state description template, and includes: 1. Environmental classification identifier: such as complex lighting environment, high-speed dynamic environment, or remote rural road. 2. Dynamic attribute description: including the timestamp of image acquisition, the current driving speed level, and the significance of environmental changes. 3. Multi-dimensional archive data: comprehensive road surface environment assessment data formed by combining classification results and time-series analysis.

[0041] By combining the classified environmental state description with image acquisition timestamps and speed information, a time-series analysis of the dynamic changes in the road environment is performed to determine the final comprehensive description of the road environment. Based on the classification results and time-series analysis data, a multi-dimensional road state profile is constructed to obtain complete road environment assessment data. Specifically, the classified environmental state description is obtained, and the associated image acquisition timestamps and vehicle speed information are extracted. Time-series data is used to analyze the evolution trend of the environmental state over time. For example, it analyzes whether lighting, shadows, or road surface materials change rapidly with vehicle movement. Based on the results of the time-series analysis, the bias of single-frame image classification is corrected to determine the final comprehensive description of the road environment. The time-series analysis data is integrated with the classification results to construct a multi-dimensional road state profile, providing a basic environmental background for subsequent reinforcement learning reward calculation. Time-series analysis addresses the pain points of traditional techniques that rely on single images and ignore detection continuity, thus better adapting to real-world business scenarios involving high-speed movement or rapid environmental changes.

[0042] S102, for the initial road surface environment description, image processing technology is used to extract local area texture details and edge contour information from the original image, and the dynamic environment influence is analyzed in combination with speed information to obtain a comprehensive environmental feature representation.

[0043] In one possible implementation, raw image data of the road surface environment is acquired using image acquisition devices such as 3D laser scanners or industrial cameras. A pre-defined image segmentation method is then used to divide the raw image into multiple local image regions. For each of these local image regions, image processing techniques are applied to extract edge contour information, determining the area with higher edge sharpness.

[0044] In one possible implementation, raw image data of the road surface environment is acquired, and region segmentation is performed, dividing the entire image into multiple independent local image regions. For each segmented local region, pixel gradient changes are scanned to identify the geometric contours reflecting road defects (such as crack edges). By quantitatively analyzing the extracted contours and evaluating their edge contrast and coherence, the system filters out regions with high edge clarity, typically referring to specific pixel intervals with sharp gradient changes and distinct contour features. These regions are considered as candidate areas for subsequent key detection.

[0045] Based on areas with high edge clarity, the system further analyzes the texture distribution within local image regions to obtain local image patches with high texture complexity. Speed data during vehicle operation is acquired; if the speed data exceeds a preset threshold, the local image patches with high texture complexity are dynamically corrected to obtain corrected image patches. Specifically, the system further analyzes the pixel arrangement patterns within local regions, such as calculating the spatial statistical characteristics of pixels (e.g., contrast, correlation). The system extracts areas with large fluctuations and non-smoothness from the texture distribution data, forming local image patches with high texture complexity, which may contain feature regions containing details such as micro-cracks or potholes. If the vehicle speed data exceeds a preset threshold at this time, the system dynamically corrects these high-complexity image patches to compensate for displacement or jitter caused by vehicle movement. The preset threshold range serves as a critical point for triggering dynamic image correction, distinguishing between stable acquisition states and states that may be blurred due to high-speed movement.

[0046] Feature extraction is performed on the corrected image patches, and the impact of dynamic environmental changes is analyzed in conjunction with velocity data to determine the significance of these changes. If the significance of the environmental changes exceeds a preset standard, secondary feature enhancement processing is applied to the corrected image patches to obtain the final environmental feature data.

[0047] Specifically, the system performs preliminary feature vectorization on the corrected image patches and then correlates the extracted features with real-time speed data. Using speed data as a reference dimension, the system observes the fluctuations in image features (such as texture consistency and edge sharpness) with speed. This cross-analysis determines whether the current environmental features are caused by actual road conditions or dynamic driving interference. The system also quantifies the significance of environmental changes, such as calculating the deviation between adjacent frames or between feature patches and standard reference values. By setting a preset standard (i.e., a significance threshold), if the deviation exceeds this standard, it is determined that the environmental change is significant, indicating that the current image quality is greatly affected by environmental fluctuations (such as high-speed jitter or sudden changes in lighting). When the significance exceeds the preset standard, enhancement logic is activated to perform secondary feature enhancement on the corrected image patches, such as contrast stretching, edge sharpening, or enhancement of textures at specific frequencies. Through this secondary processing, the negative interference caused by significant environmental changes is offset, resulting in the final environmental feature data.

[0048] S103, based on the comprehensive environmental feature representation, analyze the pixel spatial correlation and regional contrast difference. If the defect confidence score of the detection window coverage area is lower than the preset threshold, then perform window translation or scaling operation through the pre-established action selection rules to obtain the adjusted local image area.

[0049] By extracting environmental features from the input image data, a convolutional neural network (CNN)-based method is used to analyze the overall structure of the image, resulting in a preliminary environmental feature distribution map. These environmental features refer not only to the physical environment but also, and more importantly, to the statistical and structural features of the image. This includes the spatial correlation of pixels, the contrast differences between regions, and abstract semantic features extracted by the CNN (such as the smoothness trend of the road surface and large-scale light and shadow distribution). Based on the preliminary environmental feature distribution map, the correlation degree within the pixel space is analyzed, and statistical methods are used to calculate the correlation strength between adjacent pixels, determining the correlation distribution results in the pixel space.

[0050] In one possible implementation, CNNs use multiple convolutional kernels to filter features and reduce dimensionality of the input image, automatically identifying high-frequency (details) and low-frequency (background) information. The overall image structure is analyzed; the network scans the global layout of the image, identifying which areas belong to the background road surface and which areas may contain foreign objects or defects, generating a mapping table reflecting the feature intensity of each part of the image—a preliminary environmental feature distribution map. After obtaining the distribution map, the system enters a more microscopic statistical analysis stage to determine the initial values of the detection window. Specifically, for the distribution map generated by the CNN, the system analyzes the correlation within the pixel space, that is, assessing the similarity of a pixel to its neighboring pixels in terms of feature curvature. Statistical algorithms (such as mutual information, covariance, or spatial autocorrelation coefficient) are used to calculate the correlation strength between adjacent pixels. If the feature values of adjacent pixels change smoothly and the correlation strength is high, the area tends to be a consistent background; if the correlation strength drops sharply, it usually means the presence of edges, cracks, or noise. The final output is a set of spatial correlation distribution results, usually represented in the form of a matrix or heatmap. This result directly determines the subsequent division of the detection window. The system will divide the area into multiple preliminary detection zones based on the boundary of the correlation strength change, and calculate the defect confidence score within each window.

[0051] Based on the correlation distribution of pixel space and the differences in regional contrast, multiple detection windows are defined, and contrast difference data within each detection window is obtained. The differences in regional contrast refer to the degree of significance of brightness, color, or texture features between different spatial regions in the image. The comparison objects are typically the area covered by the detection window and its adjacent background area, or different local pixel blocks within the window.

[0052] After acquiring the pixel spatial correlation distribution, the system locates potential targets through contrast analysis. Specifically, it determines these targets by analyzing the grayscale distribution and statistical differences of pixels in the preliminary environmental feature distribution map. Based on the pixel spatial correlation distribution results (variations in correlation strength) and contrast difference data, the system segments the image into multiple characteristic rectangular or polygonal regions, i.e., detection windows. Statistical methods are used to calculate the variance, range, or local histogram differences of pixel values within each detection window, thereby quantifying the distinguishability of that window from its surrounding environment.

[0053] For the contrast difference data within each detection window, a defect confidence score for the covered area is calculated. If the defect confidence score is lower than a preset score threshold, a subsequent action selection rule is triggered to determine whether to perform an adjustment operation. According to the action selection rule, window translation or scaling operations are performed on the detection windows that meet the conditions to adjust the range of the covered area, resulting in the adjusted local image region.

[0054] Specifically, the system evaluates the contrast difference data within each window to determine if its features conform to the statistical distribution of typical defects such as cracks and pits. It then combines the feature vectors extracted by the CNN with preset defect patterns to score the matching degree; a higher score indicates a greater likelihood of a defect in that area. A scoring threshold is set as the critical point to trigger window adjustments. When a window's defect confidence score falls below the preset threshold, it means the current window may not be accurately aligned with the defect, or the defect features are not obvious, requiring a viewpoint adjustment. For example, window panning changes the window's X and Y positions in the image coordinate system; window scaling adjusts the window's size (width and height) to achieve multi-scale observation. The parameters of the action selection rules are not fixed; the system dynamically adjusts these parameters based on subsequent positive and negative reward feedback values, thereby optimizing the window adjustment mechanism.

[0055] Furthermore, by performing secondary environmental feature extraction on the adjusted local image region, the correlation between pixel space and regional contrast within the region is verified to determine the final image processing result. Based on the final image processing result, the coverage area of the detection window is updated, and the calculation and adjustment of the defect confidence score are performed iteratively to obtain optimized local image data.

[0056] In one possible implementation, the system performs secondary environmental feature extraction on the adjusted local image region. At this point, it compares two dimensions: 1. Pixel spatial correlation: the distribution of the correlation strength between pixels. 2. Region contrast: the level of difference between the local region and its background or internal features. When these two reach a preset matching standard (i.e., the correlation breakpoint corresponds exactly to a high-contrast edge), it indicates that the window has accurately captured the core of the defect. The system recalculates the defect confidence score within the adjusted window. If the score still does not meet the requirements, the system continues to adjust according to the action selection rules. Only when the verification is passed and the score stabilizes or meets the standard will the system stop adjusting, update the coverage area of the detection window, and output the final image processing result. The final image processing result specifically includes the following information: 1. Optimized local image data: refers to the precise image patch that best highlights the defect features after translation and scaling adjustments. 2. Updated detection window parameters: determines the final locked coverage area (coordinates and scale). 3. Verified feature description: includes quantified feature values after the pixel spatial correlation and region contrast within the region have reached a match.

[0057] The final processing result will be directly input into step S104 to extract new edge contour information and texture details. Through feature vector fusion technology, this obtained local information will be merged into the overall road surface condition description. This "verification-feedback-readjustment" mechanism ensures that the system output is no longer a blurry original image, but high-value, high-confidence defect candidate area data.

[0058] S104: Extract new edge contour information and local texture details from the adjusted local image region, merge them with the comprehensive environmental feature representation through feature vector fusion technology, and simultaneously filter environmental noise to obtain the updated road surface state description.

[0059] Initial data is obtained from the adjusted local image region, and preliminary image structure information is obtained by separating edge contours and texture details. Based on the preliminary image structure information, a feature vector construction method is used to transform the edge contours and texture details into computable feature vector data, determining the quantization representation for subsequent processing.

[0060] Specifically, raw pixel data is extracted from the adjusted local image region. Image processing algorithms are used to independently extract high-frequency components (corresponding to edge contours, reflecting geometric structure) and mid-to-high-frequency repetitive components (corresponding to texture details, reflecting surface material characteristics). The separated edges and textures are categorized to form preliminary image structure information, including the geometric topology of road defects (such as the direction and width trend of cracks), the micro-texture distribution map of the road surface (roughness information reflecting the degree of damage), and the spatial logical relationships within the region, which are used for subsequent comparison with global environmental features. Further, a feature vector construction method is used to transform the unstructured edge contour pixel coordinates, radians, lengths, and statistical indicators such as texture contrast and entropy into numerical matrices. Through mathematical transformations (such as principal component analysis or linear mapping), these multi-dimensional indicators are compressed and transformed into computable feature vector data of a unified dimension. The purpose of this step is to transform visual information into a digital signature that computers can directly perform mathematical operations on (such as feature fusion and classification mapping), i.e., to determine the quantitative representation used for subsequent processing.

[0061] By matching feature vector data with a pre-established environmental feature library and combining fusion technology, local information is integrated with the overall environmental representation to determine a comprehensive feature description. If environmental noise interference is detected in the comprehensive feature description, the data is cleaned using preset noise filtering rules to obtain purified feature information. The pre-established environmental feature library is formed by collecting a large amount of road surface data under different working conditions (such as sunny days, rainy days, different road surface materials, typical defect images, etc.), extracting feature vectors of edges, textures, and grayscale distributions, and classifying and storing them to form a feature quantification database containing standard road surfaces and typical defects. The comprehensive feature description is a high-dimensional composite vector that includes both microscopic defect details (such as the microscopic direction of cracks) and macroscopic environmental context (such as road surface lighting and vehicle speed interference factors), which can more comprehensively reflect the true state of the current road surface and provide a basis for subsequent accurate classification. The preset noise filtering rules typically include median filtering (removing isolated noise points), Gaussian filtering (smoothing light and shadow interference), or threshold truncation based on frequency features.

[0062] In one possible implementation, the system compares the feature vector data with templates in an environmental feature library for similarity. By calculating Euclidean distance or cosine similarity, it determines which known road surface pattern or defect type the current local feature belongs to. Then, a feature vector fusion technique is used to concatenate or weightedly fuse the local information extracted from the adjusted window (new edge and texture feature vectors) with the overall environmental representation (macroscopic background features obtained in S102) to obtain a comprehensive feature description. The system searches for anomalous signals in the generated comprehensive feature description that do not conform to physical logic or road surface features. For example, if high-frequency random fluctuations (such as noise generated by light and shadow flicker) or outliers completely unrelated to surrounding pixels appear in the feature vector, it is determined that environmental noise has been detected. The system processes the data using preset cleaning rules, such as removing isolated noise and smoothing light and shadow interference, to obtain purified feature information.

[0063] Based on the purified feature information, road surface conditions are classified and mapped. A Support Vector Machine (SVM) algorithm is used to divide the conditions into categories, resulting in specific condition labels. Specifically, the purified high-dimensional feature vectors are used as input to the SVM algorithm. The SVM kernel function maps the feature vectors to a high-dimensional space to find the hyperplane that best distinguishes different road surface conditions. Based on the distribution of feature vectors on both sides or in different regions of the hyperplane, the data is divided into specific condition categories (e.g., normal road surface, micro-crack area, pothole interference area, etc.), and a unique condition label is assigned to each category.

[0064] The system compares the status identifiers with preset status description templates to generate the final road surface status description information, thus determining the complete analysis results. The preset status description templates are a set of predefined standard texts or parameter sets used to transform abstract status identifiers into readable descriptions. The system searches and compares the status identifiers in a template library. When an identifier matches a template, it automatically fills in relevant parameters (such as location, confidence level, etc.) to generate the final road surface status description information. An example of the road surface status description information is: "A micro-crack was detected at coordinates (X,Y) on a remote rural road, with a confidence level of 85% and moderate interference from light and shadow noise."

[0065] If classification bias exists in the final road surface condition description, local correction is performed by backtracking the feature vector data to obtain the adjusted condition description. Specifically, if the generated description does not match the contextual time-series analysis results, the confidence level of the classification result is at a critical value, or the probability difference between multiple categories is too small, local correction is performed by backtracking the feature vector data. For example, when classification bias is detected, the system pauses output and returns to the original feature vector data level that generated the classification. The local features causing the bias (such as texture or edge vectors of a specific frequency) are reweighted or fine-tuned, and the corrected vectors are used to reclassify via SVM until an accurate and logically consistent condition description is obtained. This local correction by backtracking the original feature vectors significantly improves the accuracy of the final road surface condition description.

[0066] S105: For the updated road surface condition description, the deviation between the defect boundary and the actual location is calculated by evaluating the positioning accuracy, and the resource consumption is calculated by combining the resource occupancy ratio and resource optimization constraints to obtain positive and negative reward feedback values. If the reward feedback value is positive and exceeds the dynamically adjusted threshold, the current area is determined to be a suspected defect area.

[0067] Specifically, updated road surface condition data is acquired, and preliminary processing is performed on the collected images and sensor information. Potential defect areas are separated using segmentation techniques to obtain preliminary defect boundary information. Image information refers to the original images acquired by the road surface acquisition equipment and the feature images after S104 purification processing. Sensor information mainly includes vehicle speed information and data used for positioning and environmental perception (such as timestamps). Based on the preliminary defect boundary information, a positioning accuracy evaluation method is used to calculate the deviation between the defect boundary and the actual location. By comparing this deviation with a preset deviation range, it is determined whether the deviation is within an acceptable range.

[0068] In one possible implementation, for example, a U-Net-based semantic segmentation network is used to initially segment the crack region in the image, obtaining the defect boundary. Subsequently, a high-precision 3D laser scanner is used to acquire the actual defect boundary location of the corresponding road surface. The mean intersection-to-union ratio (mIoU) is used as a positioning accuracy evaluation index to calculate the deviation between the detected boundary and the actual boundary. This deviation can be defined as 1-IoU, where IoU is the intersection of the detected area and the actual area divided by the union. The system presets a deviation threshold T=0.2 (i.e., IoU≥0.8 is considered acceptable). If the calculated deviation value ≤0.2, the deviation is considered within an acceptable range; otherwise, re-acquisition or manual review is triggered.

[0069] Based on the deviation calculation results and resource utilization ratio data, the resource consumption of the current area is analyzed. Resource allocation is restricted by optimization constraints to obtain an assessment of resource consumption. According to the resource consumption assessment results, positive and negative reward feedback values are calculated. If the reward feedback value is greater than zero and exceeds the dynamically adjusted threshold, the current area is identified as a suspected defect area. The resource utilization ratio data refers to the percentage of hardware resources consumed by the system when performing the current image processing and defect detection tasks. This data is typically obtained in real-time through the system monitoring module and includes indicators such as processor (CPU / GPU) utilization, memory utilization, and data transmission bandwidth. Optimization constraints refer to pre-set upper limits or efficiency thresholds for resource usage (e.g., specifying that the processing time for a single frame must not exceed a certain number of milliseconds, or that memory usage must not exceed a preset threshold).

[0070] In one possible implementation, for example, the processor utilization rate of the current image processing task can be obtained in real time through a system monitoring module. Memory usage and data transmission bandwidth utilization Define a resource utilization ratio vector. The optimization constraint is set to a preset upper limit. That is, none of the indicators must exceed this threshold. Calculate the reward feedback value. ,in , , This is the weighting coefficient. If... This indicates that the overall resource usage is below the upper limit, which is a positive feedback; if This is negative feedback. Dynamic threshold An exponentially weighted moving average can be used for updating: ,in As a smoothing factor (e.g., 0.8), the initial threshold The current area is identified as a suspected defect area when both of the following conditions are met: (1) (2) The technical rationale is that experimental statistics show that image segmentation of defective regions typically requires more computational resources, leading to... Although the value is still positive, it is relatively small, so it can be considered abnormal when it exceeds the dynamic historical baseline.

[0071] For suspected defect areas, more detailed road surface condition data is obtained. A convolutional neural network (CNN) in a deep learning model is used for secondary verification of these areas to determine if they are genuine defect areas. Specifically, a local image patch (high resolution) of the suspected area is input into a specially trained CNN. The CNN extracts deep semantic features from this area, identifying key criteria such as the microscopic direction of cracks and the depth and shadow of potholes. The probability score of this area belonging to a genuine defect is calculated. If the score exceeds the secondary verification threshold, noise interference is excluded, and the area is officially marked as a genuine defect area. The more detailed road surface condition data includes: 1. Multi-scale details: High-resolution local features obtained after window scaling in S103. 2. Spatiotemporal correlation: Time-series analysis results combining timestamps and speed information, which can distinguish between permanent road surface defects and transient light and shadow interference. 3. Historical comparison values: Includes the state evolution of this coordinate point in historical records.

[0072] Based on the results of the secondary confirmation, the classification status of suspected defect areas is updated, and priority tags are generated for actual defect areas. The system automatically records and updates the database to determine the final list of defect areas. The system prioritizes defects based on their severity and impact on traffic safety. Specific criteria for prioritization include: 1. Geometric dimensions: the length and width of cracks or the area of potholes. 2. Location criticality: whether the defect is located in the center of a main road or on a curve. 3. Confidence score: the higher the score in the secondary confirmation, the more certain the authenticity, and the higher the priority. Tagging method: The system will attach a priority vector or numerical weight to each entry in the final list of defect areas as an instruction identifier for subsequent automatic system scheduling.

[0073] Based on the final list of defective regions, the resource allocation strategy is adjusted, with resources tilted towards priority-marked regions. The system automatically schedules and optimizes resource allocation to obtain an updated resource allocation scheme. Resource tilting refers to prioritizing in-depth analysis of high-priority regions when computing resources are limited. This resource tilting for priority-marked regions is reflected in: 1. Computing power allocation: allocating more computing cores to perform more complex image enhancement and 3D modeling on high-priority regions. 2. Frequency control: increasing the feature extraction frequency (i.e., denser sampling) of high-priority regions in the continuous detection sequence (S107). 3. Storage priority: prioritizing the saving of original high-resolution data from high-priority regions to the database.

[0074] In one possible implementation, the system sorts all detected defective areas in real time according to priority. Based on the resource occupancy ratio and optimization constraints analyzed in S105, it calculates the remaining capacity of the current system. A scheduling algorithm (typically based on heuristic rules or reinforcement learning strategies) allocates computing power and bandwidth according to priority based on resource availability, generating an updated resource allocation scheme. This guides the system to automatically reduce the computational overhead of low-priority or normal road segments when processing subsequent frame images.

[0075] S106. Based on the positive and negative reward feedback values and the reward weight allocation rules, adjust the parameters of the action selection rules, and at the same time use environmental adaptability to update the reward feedback cycle to obtain the optimized window adjustment mechanism.

[0076] The system acquires reward feedback data from the system, classifies positive and negative feedback values, and determines the degree of deviation by comparing them with preset threshold ranges, resulting in a categorized set of feedback values. Specifically, it first divides the feedback into positive and negative sets. Based on this classification, the system further subdivides the feedback values according to their absolute values. For example, it categorizes them into "significant positive feedback" (high accuracy and low resource consumption), "weak positive feedback," "general negative feedback," and "severe negative feedback" (inaccurate positioning or severe resource overload). The system then calculates the distance between the current feedback value and the threshold boundary. If the feedback value falls within the threshold range, the current action (translation or scaling) is robust with a low degree of deviation. If the feedback value far exceeds the upper limit (extremely high positive reward), the action captures extremely high-quality features, and the deviation is defined as positively significant. If the feedback value is far below the lower limit (extremely low negative reward), the current action leads to severe detection failure or resource waste, and the deviation is defined as negatively significant. The preset threshold range is a preset standard feedback interval (e.g., [-10, 10]), which represents the adjustment fluctuation range that the system considers normal and acceptable.

[0077] Through the above processing, the system obtains a set of classified feedback values. This set is not a simple accumulation of numerical values, but rather instruction data with label attributes: 1. Guided Update: The classification result directly determines how to adjust the parameters in the action selection rules. 2. Dynamic Adaptation: If the feedback value set shows that most actions are significantly negatively deviated, the system will quickly trigger the reconstruction of the action rules, thereby adopting a completely different window adjustment strategy in the next round of image processing (S107).

[0078] Based on the categorized set of feedback values and the weight allocation rules, the weight ratio corresponding to each feedback value is calculated. Through weighted processing, the initial adjustment direction of the action selection parameters is determined. The weight allocation rules are determined based on the deviation degree and feedback type (positive / negative) of the feedback values. The system performs a significance analysis on the categorized set of feedback values. Feedback values with higher deviation degrees (i.e., more extreme rewards or punishments) receive a larger weight ratio. This is because extreme feedback usually indicates that the action has produced an extremely good or bad effect, possessing higher learning value. The system multiplies each feedback value by its corresponding weight ratio and determines a gradient direction through weighted summation. For example, if positive feedback is mainly concentrated on the "enlarge window" action, the initial adjustment direction will point to "increase the scaling factor," ultimately outputting the initial adjustment vector of the action selection parameters. The action selection parameters are core parameters in deep reinforcement learning networks (such as Q-learning or policy gradient networks), directly determining the "window translation (offset)" action performed in a specific state. , The system calculates the probability distribution of "window scaling (s)" or "window scaling (s)". For the initial adjustment direction, it analyzes environmental adaptability data, extracts the changing characteristics of the current environment, and uses conditional judgment. If the environmental change exceeds a preset range, the adjustment direction is corrected to obtain a corrected parameter adjustment scheme. The environmental adaptability data includes vehicle speed and timestamps from S101, and a comprehensive environmental state description generated by S104 (such as changes in light intensity and road surface material switching). The system analyzes the stability of the current environment by comparing environmental description data from adjacent detection cycles, and extracts the slope of change or variance of fluctuation from the adaptability data. For example, if the vehicle speed suddenly increases or the light intensity changes drastically, these features will be identified as environmental abrupt changes. If the environmental change exceeds a preset range (unstable environment), the system will determine that the current initial adjustment direction may be misled by environmental noise. In this case, the system will reduce the adjustment step size or introduce a smoothing factor to counteract drastic fluctuations, thereby obtaining a corrected parameter adjustment scheme. The preset range refers to the upper limit of environmental fluctuations that the system can stably handle (e.g., speed change rate within ±10%, light intensity change within a specific lumen range).

[0079] By revising the parameter adjustment scheme, the action selection parameters are updated, and the updated parameter status is recorded synchronously. The system response data after the parameter update is obtained, and the stability index of the response data is determined. Specifically, the system determines the stability index by analyzing the variance or rate of change of the response data. For example, the standard deviation of the detection confidence within N consecutive frames is calculated; the smaller the standard deviation, the higher the stability index. After adjusting the action parameters, it is observed whether the detection window quickly locks onto the target and no longer oscillates violently. If the window position fluctuates within a small range and maintains a high score, the stability is considered good. The system response data refers to the dynamic performance indicators exhibited by the system under the new action selection parameters, specifically including: 1. Detection convergence speed: the number of frames required for the window to translate or scale to a high-confidence region. 2. Confidence fluctuation value: the degree of fluctuation in the score of the same potential defect region in continuous frame detection. 3. Processing latency: the change in the time consumed by the system to process a single frame image after updating the parameters. Based on the stability index of the response data, the applicability of the reward feedback cycle is evaluated. If the stability index is lower than the preset standard, the feedback cycle is shortened; otherwise, the cycle is extended, resulting in the adjusted feedback cycle configuration. The system balances detection sensitivity and computational overhead by adjusting the feedback cycle. The applicability assessment determines whether the current feedback step size matches the pace of environmental changes. A high stability index indicates that the current parameters are highly effective, eliminating the need for frequent reward calculations and adjustments; in this case, a longer cycle is suitable. Conversely, a stability index below a preset standard indicates that the system struggles in the current environment (e.g., repeated window jumps or a sudden drop in confidence), and the current feedback cycle is too long, resulting in sluggish system response and lack of applicability.

[0080] In one possible implementation, the preset standard is typically a stability threshold (e.g., confidence fluctuation variance not exceeding 0.05). If the stability index is lower than the preset standard, the cycle is shortened (feedback frequency increased). For example, when the environment changes drastically (e.g., sudden speed changes) or detection is unstable, the system reduces the feedback cycle from N frames to N / M frames. This means the system will calculate the positive and negative rewards in S105 more frequently, achieving fast-in-fast-out action correction and improving the accuracy of capturing minute cracks. Conversely, if the stability index is higher than the preset standard, the cycle is extended (feedback frequency decreased). For example, when the environment is stable and the stability index meets the standard, the system extends the cycle. This reduces the number of complex weighted calculations and environmental adaptability analyses in S106, reduces CPU load, and allocates resources to higher priority areas (i.e., the resource tilt mentioned in S105). Through the above process, the system finally determines the state update mechanism: injecting the corrected parameters into the neural network, reconstructing the action selection logic, and obtaining the adjusted feedback cycle configuration. This mechanism will directly guide step S107, controlling the pace of feature extraction and fusion, ensuring that in a continuously acquired sequence, defects can be quickly located and location information can be smoothly output.

[0081] In scenarios with varying road conditions, the system can cope with bumps and sudden changes in light and shadow by shortening the cycle and save computing power on smooth road sections by extending the cycle, thereby achieving the most accurate automated detection with limited hardware resources and realizing the self-adjustment of the detection algorithm.

[0082] An adjusted feedback cycle configuration, combined with a window adjustment mechanism, dynamically updates the window size. By monitoring the system's operational status in real time, it determines whether the window adjustment meets expectations, resulting in an optimized window adjustment outcome. The window adjustment mechanism refers to an execution logic based on action selection rules (defined in S103 and reconstructed in S106). It controls the translation (coordinate displacement) and scaling (scale change) of the detection window in the image coordinate system to achieve focused observation of suspected road surface defect areas. The expected outcome refers to whether the defect confidence score (S103) improves and the positioning deviation (S105) decreases after window adjustment. Simply put, the expected outcome is that the adjusted window can more accurately and clearly capture the defect.

[0083] In one possible implementation, the system utilizes the revised parameter adjustment scheme obtained in S106 to change the window translation step size (speed) and scaling ratio in real time. Within different feedback cycles, the system determines the direction, size, or direction the window should move in the next image based on the latest parameter instructions. The system then ensures the effectiveness of the action through real-time monitoring. If the window adjustment meets expectations, the system maintains the current parameter adjustment direction and records this successful "action-state" mapping as experience for subsequent state update mechanisms. If the adjustment does not meet expectations (e.g., confidence decreases after window shifting, or the window vibrates violently and cannot be locked), the system will: 1. Trigger backtracking: Refer to the mechanism in S104 to check if environmental noise has not been filtered out. 2. Readjust parameters: Immediately use negative reward feedback (S105) to penalize the action selection parameters. 3. Shorten the feedback cycle: Immediately increase the feedback frequency in the next frame to accelerate the search for the correct window configuration. Finally, after the above fine-tuning and verification, the final window coordinates and size parameters are determined.

[0084] Based on the optimized window adjustment results, new reward feedback value data is continuously collected, and the feedback value analysis and parameter adjustment process is executed cyclically to determine the system's adaptive performance under different environments. The system evaluates its adaptive performance by "continuously collecting new rewards" and "cyclically executing analysis," specifically quantified through the following dimensions: 1. Stability index (S106): Whether the system response data can converge quickly under different vehicle speeds and lighting conditions (S101-S102 data). 2. Average reward value: The total cumulative reward obtained by the system under specific environments (such as rainy days or bumpy road sections); the higher the reward, the stronger the adaptability. 3. Resource utilization efficiency: Whether the resource usage after automatic scheduling (S105) is in an optimized state, provided that the accuracy standard is met.

[0085] Through this cycle, the system can eventually generate an adaptive assessment profile under different environments. This not only determines the current performance of the system but also provides an environmental awareness basis for the S107's rhythm control strategy.

[0086] S107 employs an optimized window adjustment mechanism to perform window translation or scaling operations on subsequently acquired road surface image data. The rhythm of feature extraction and fusion is controlled by the state update frequency to obtain a continuous defect localization sequence.

[0087] Raw image data is acquired from road image acquisition equipment. Preprocessing operations are performed on each acquired frame, such as median filtering, to remove noise interference and obtain clear initial image data. An optimized window adjustment mechanism is used to perform window translation operations on the initial image data, covering different regions of the image and determining the local detail information within each window region. Multi-scale analysis of the local detail information is then performed through window scaling operations to extract image features at different scales, resulting in a multi-level feature set.

[0088] In one possible implementation, the window size parameters, such as W×H pixels, are determined according to the optimized window adjustment mechanism. For example, the translation step size in the horizontal and vertical directions is S pixels, and it satisfies... , Preferred, This means that adjacent windows overlap by 50% in the horizontal direction to balance computational efficiency and feature continuity; the step size S can also be configured to other values depending on computational resources or analysis accuracy requirements, for example... or Starting from the top left corner of the image, the window is translated horizontally from left to right and vertically from top to bottom with a step size S until the entire image area is covered. At each translated window position, the grayscale values of all pixels within the window are extracted. Let N be the number of valid pixels (pixels located within the image boundary) within the window, and let the grayscale values of these pixels be... Then the local mean and local variance Calculate using the following unbiased estimation formulas respectively: , .Will and This serves as local detail information for the image region corresponding to the current window.

[0089] When a window is shifted and part of its area extends beyond the image boundary, only the valid pixel region within the image is processed. Specifically: if the right boundary of the window extends beyond the right boundary of the image, the pixel column starting from the left side of the window and extending to the right boundary of the image is selected as the valid region; if the lower boundary extends beyond the lower boundary of the image, the pixel row starting from the top edge of the window and extending to the lower boundary of the image is selected. If the number of valid pixels at this time... If the effective pixels are less than 1 / 4 of the window area, the window is discarded, the mean and variance are no longer calculated, and subsequent translation of the row or column is terminated.

[0090] After completing a single-scale sliding window traversal, a window scaling operation is performed to enable multi-scale analysis. The width and height of the current window are both multiplied by a scaling factor. (like ), that is, the new window size is Pixels. During scaling, bilinear interpolation is used to resample the grayscale values of the original image, generating a scaled image at the corresponding scale. At the new scale, the calculation process of the translation step size (still half the new window size) and local mean and variance is repeated. This scaling is performed multiple times, for a total of K scales (e.g., K=3), resulting in three sets of local features at three different scales.

[0091] Finally, the mean and variance of all windows at each scale are expanded into feature vectors according to the window scanning order, and the feature vectors of different scales are sequentially concatenated to form a multi-level feature set describing the entire image, which can be used for subsequent image analysis or recognition tasks.

[0092] The feature set is periodically integrated based on the state update frequency to control the rhythm of feature extraction and fusion. It is determined whether the feature set meets the preset integrity conditions. If it does, it proceeds to the next step.

[0093] In one possible implementation, for example, the state update frequency That is, every The feature set is integrated once every second for consecutive image frames acquired. For example, if 6 images (30 FPS) are acquired in the last 0.2 seconds, the element-wise arithmetic mean of the feature vectors of these 6 frames is calculated to obtain the fused feature vector for that time period. Meanwhile, the rhythm of feature extraction and fusion is controlled by adjusting the step size S of the sliding window and the number of image scaling operations. For example, when the instantaneous vehicle speed is higher than 60 km / h, to reduce the computational load and ensure real-time performance, the step size is increased from 32 pixels to 48 pixels, and the number of multi-scale levels is reduced from 3 to 2 (i.e., only the 64×64 and 32×32 scales are calculated). When the speed is lower than 20 km / h, the original step size and 3-scale settings are restored to extract finer-grained defect features. This achieves dynamic rhythm control.

[0094] After each timed integration, it is determined whether the fused feature vector meets the preset integrity condition. The integrity condition is that the following two sub-conditions are met simultaneously: Condition A (effective frame rate): the number of effective frames in the most recent 0.2-second time period that actually successfully extracted features. With total frames The ratio must be ≥0.8 (i.e., at least 5 frames are valid). Condition B (Feature Stability): The Euclidean distance between the current fused feature vector and the fused feature vector of the previous time period. Less than the preset threshold (This threshold was obtained through statistical analysis of pre-experiments on defect-free pavements), indicating that the pavement defect features have not undergone drastic changes and the feature set has stabilized. If both of the above conditions are met, the current fused feature vector is deemed to have completed complete construction and proceeds to subsequent processing; if the completeness condition is not met (e.g., insufficient effective frames or excessive feature distance), the system automatically extends the integration time, such as temporarily reducing the state update frequency to 2Hz (i.e., integrating once every 0.5 seconds), continuing to accumulate image frames until the condition is met, or issuing an "abnormal detection area" alarm and skipping the current road segment if the condition is not met for three consecutive time periods. For the integrated feature set, a convolutional neural network algorithm is applied for feature fusion processing to generate a unified feature representation and determine the preliminary location result of the defect area. The system transforms the dispersed multi-scale features into specific spatial coordinates. Specifically, the multi-level feature set is input into a dedicated convolutional neural network (CNN). The CNN uses multiple convolutional and fully connected layers to nonlinearly combine feature vectors of different scales (macro background and micro details), eliminating redundancy between features of different scales and mapping them to a unified feature space. The network outputs a fused high-dimensional vector, a unified feature representation, containing a numerical matrix or vector of multi-dimensional attributes, such as: [defect type identifier, geometric center coordinates (x, y), width feature value, length feature value, texture explicitness score, and environmental noise interference coefficient]. This representation integrates visual features and spatial information. Using this fused feature representation, the system generates a preliminary localization result in the current image coordinate system, typically represented as a candidate bounding box closely adhering to the suspected defect edge, thus determining the pixel range of the defect within a single image.

[0095] By employing a continuous sequence generation mechanism, the preliminary localization results are arranged chronologically, and combined with a rhythm control strategy, a smooth and continuous defect localization sequence is obtained. The continuous sequence generation mechanism is a time-correlation-based tracking mechanism. It doesn't just consider a single frame image; instead, it arranges the preliminary localization results from multiple consecutive frames according to the acquisition time sequence and uses filtering algorithms (such as Kalman filtering) to predict the location of the defect in the next frame, thus connecting discrete localization points into a logically coherent motion trajectory. The rhythm control strategy dynamically adjusts the sequence sampling density and smoothing step size based on the state update frequency output by S106. For example, in stable environments, the calculation frequency is reduced to maintain sequence smoothness; in environments with drastic changes (such as high-speed bumps), the rhythm is accelerated, using high-frequency capture to prevent defects from being lost in the sequence, thereby obtaining a smooth and continuous defect localization sequence.

[0096] Based on the defect location sequence, the image data is labeled to generate structured location information output. The location information is then assessed to determine if it meets a preset accuracy standard. If it does, the processing is complete. This preset accuracy standard is a multi-dimensional quantitative indicator, typically including: 1. Location error limit: The deviation between the location coordinates and the actual trajectory must be less than a preset pixel value (corresponding to the deviation range in S105). 2. Confidence threshold: The defect confidence score of most nodes in the sequence must be higher than a specific score. 3. Sequence continuity: The location trajectory cannot have abrupt interruptions.

[0097] In one possible implementation, if the positioning information fails to meet the accuracy standard, the system will not directly complete the process. Instead, it will trigger the following feedback logic: 1. Backtracking and reprocessing: The system will backtrack to the starting point of S106 / S107 with the current failure signal. 2. Adjusting action parameters: Decrease the window translation step size or increase the scaling factor, and re-extract features. 3. Increasing the sampling frequency: During the period when the standard is not met, the state update frequency will be forcibly increased until the generated sequence can smoothly pass the accuracy check.

[0098] The core of step S107 lies in solving the problem of inaccurate single-point identification through the "fusion power" of CNN and solving the jitter problem under dynamic interference through the "temporal power" of the sequence mechanism. The final accuracy standard verification ensures that the data output to the user is high-quality, structured data that has undergone multiple verifications. In S108, defect confidence scores and edge contour information of suspected defect areas are extracted from the continuous defect localization sequence to determine the specific location and type of the final crack, outputting the detection results of micro-cracks on the road surface. Defect localization data is obtained from the continuous sequence, and preliminary screening of suspected areas is performed. Image processing techniques are used to separate areas where cracks may exist, resulting in initially defined candidate defect areas. Specifically, all candidate coordinates and corresponding confidence data are extracted from the continuous defect localization sequence generated in S107. Localization points with similar physical locations in the time series are clustered to eliminate redundant points caused by repeated detections or jitter. It is checked whether the area is marked as "suspected" in multiple consecutive frames of images, and transient interference points are removed. Then, for the initially identified areas, image processing techniques such as adaptive threshold segmentation or morphological processing techniques (such as closing operations) are used to enhance the contrast between the cracks and the background road surface. The set of dark pixels that may contain cracks is separated through pixel-level segmentation, thereby obtaining the initially defined defect candidate areas.

[0099] For the initially identified defect candidate regions, confidence score data is extracted and filtered using a preset threshold to eliminate regions with low scores, thus identifying high-confidence suspected crack regions. The preset threshold serves as a scoring filtering threshold, retaining only regions whose scores, after multi-scale and multi-level feature fusion in steps S103 and S107, exceed this threshold. Any low-confidence score regions caused by light and shadow interference, road debris, etc., are completely removed in this step to reduce the false alarm rate of the detection results.

[0100] Edge contour information is extracted from high-confidence suspected crack areas, and geometric analysis methods are used to classify the contour shapes and determine the specific morphological characteristics of the cracks. Specifically, the system uses a geometric shape feature extraction algorithm, mainly analyzing the aspect ratio, i.e., calculating the aspect ratio of the bounding rectangle of the candidate region; analyzing curvature and continuity, i.e., evaluating the linearity of pixel distribution; and calculating the area-to-perimeter ratio to distinguish between isolated point-like pits and linear cracks. Based on the results of the geometric analysis, the system classifies cracks into the following specific forms: 1. Transverse cracks: large aspect ratio, running basically perpendicular to the direction of travel. 2. Longitudinal cracks: large aspect ratio, running basically parallel to the direction of travel. 3. Mesh cracks (crazing): geometrically characterized by multiple closed or semi-closed small block combinations with complex contour topology. 4. Pit / block defects: small aspect ratio, with a circular or irregular block shape.

[0101] Based on the specific morphological characteristics of the cracks and combined with crack location data, the distribution of cracks on the road surface is analyzed to obtain the spatial distribution pattern of the cracks. Specifically, the continuous defect location sequence obtained in S107 is mapped to the global road surface coordinate system, and spatial statistical methods (such as point pattern analysis or kernel density estimation) are used to analyze the distribution density of crack coordinate points on the road surface. By analyzing the correlation of crack location data, it is identified whether it belongs to random distribution, linear strip distribution, or regional clustering distribution, thereby obtaining the spatial distribution pattern. The spatial distribution pattern refers to the organization structure and evolution trend of cracks in the three-dimensional space or two-dimensional plane of the road surface. It is not only an isolated point, but also includes the density of cracks, their arrangement direction (such as parallel, intersecting, diverging), and their relative relationship with the lane lines.

[0102] Regarding the spatial distribution pattern of cracks, if the cracks are concentrated in a specific area, further crack type information is extracted to determine whether they are micro-cracks or other categories. The specific area refers to critical stress areas of the pavement (such as wheel tracks and longitudinal joints) or areas of severe local damage caused by construction quality issues or foundation settlement. By integrating the crack type information and location analysis results with the micro-crack detection data, the final detection conclusion is determined.

[0103] In one possible implementation, the system sets a density threshold for the spatial distribution pattern of cracks. When the total length of cracks or the number of crack points per unit area in a specific region exceeds the preset threshold, it is determined to be a concentrated distribution. Further subdivision is performed based on morphological classification, extracting crack width features and contrast to determine whether the crack belongs to a micro-crack (typically extremely narrow and with indistinct features) or another category such as conventional cracks / pits. Using crack type information and location analysis results, the system integrates discrete data points about micro-cracks into structured information through multi-source information fusion technology. Specifically, the texture details (S104), positioning accuracy (S105), and multi-scale features (S107) are uniformly numbered. Width, direction, and confidence data of the same micro-crack collected at different timestamps and detection windows are integrated, taking the best aspects and compensating for the shortcomings. By determining the spatial distribution pattern, it is ensured that the same crack appears only once in the final conclusion, resulting in a structured inspection conclusion that typically includes: a defect profile, i.e., a unique number for each defect; location coordinates, i.e., the GPS coordinates or mileage offset of the crack's start and end points on the road surface; type and morphology, such as "micro-longitudinal crack"; geometric indicators, i.e., the estimated length, maximum width, and damaged area; confidence level evaluation, i.e., the reliability of the inspection result (based on the statistical results of positive and negative reward values); and environmental background description, i.e., the vehicle speed, lighting conditions, and road surface material during the inspection, used to assist in manual verification.

[0104] like Figure 3As shown, this invention provides an automated road surface defect detection system based on deep reinforcement learning, which mainly includes:

[0105] The multi-source data acquisition and fusion module is used to acquire the original image data collected on the current road surface, vehicle speed information, and historical defect detection records. It also uses multi-source data fusion technology to initially integrate the grayscale distribution features of the images with speed information and historical records to obtain an initial road surface environment description.

[0106] The feature extraction and environment analysis module is used to extract local texture details and edge contour information from the original image using image processing technology, based on the initial road surface environment description, and to analyze the impact of dynamic environment by combining speed information to obtain a comprehensive environmental feature representation.

[0107] The window adjustment module is used to analyze pixel spatial correlation and regional contrast differences based on the comprehensive environmental feature representation. When the defect confidence score of the detection window coverage area is lower than the preset threshold, the window translation or scaling operation is performed through the pre-established action selection rules to obtain the adjusted local image area.

[0108] The state update module is used to extract new edge contour information and local texture details from the adjusted local image region, merge them with the comprehensive environmental feature representation through feature vector fusion technology, and filter environmental noise to obtain the updated road surface state description.

[0109] The reward evaluation and judgment module is used to evaluate the deviation between the defect boundary and the actual location based on the updated road surface condition description, and to analyze and calculate resource consumption by combining the resource occupancy ratio and resource optimization constraints to obtain positive and negative reward feedback values. When the reward feedback value is positive and exceeds the dynamically adjusted threshold, the current area is determined to be a suspected defect area.

[0110] The learning optimization module is used to adjust the parameters of the action selection rule based on the positive and negative reward feedback values and the reward weight allocation rules, and to update the reward feedback cycle using environmental adaptability correction to obtain the optimized window adjustment mechanism.

[0111] The defect location sequence generation module is used to perform window translation or scaling operations on the subsequently acquired road surface image data using an optimized window adjustment mechanism. The rhythm of feature extraction and fusion is controlled by the state update frequency to obtain a continuous defect location sequence.

[0112] The defect identification and output module is used to extract the defect confidence score and edge contour information of suspected defect areas from a continuous defect location sequence, determine the specific location and type of the final crack, and output the detection results of micro-cracks on the road surface.

[0113] The above description is merely a preferred embodiment of the present invention and is not intended to limit the present invention. The scope of patent protection of the present invention shall be determined by the claims. Similarly, any equivalent structural changes made based on the description and drawings of the present invention shall also be included within the scope of protection of the present invention.

Claims

1. An automated road surface defect detection method based on deep reinforcement learning, characterized in that, The method includes: The system acquires raw image data of the current road surface, vehicle speed information, and historical defect detection records. It then fuses the grayscale distribution characteristics, speed information, and historical records of the data to generate an initial road surface environment description. Based on the initial road surface environment description, local texture details and edge contour information are extracted from the original image, and the dynamic environment influence is analyzed by combining speed information to obtain a comprehensive environmental feature representation; Based on the comprehensive environmental features, the spatial correlation of pixels and the difference in regional contrast are analyzed. When the confidence score of the defect in the detection window is lower than the preset threshold, the window is translated or scaled according to the pre-established action selection rules to obtain the adjusted local image region. New edge contour information and local texture details are extracted from the adjusted local image region, fused with the comprehensive environmental feature representation and filtered for environmental noise to obtain an updated road surface state description. The deviation between the defect boundary and the actual location is calculated by evaluating the positioning accuracy. Resource consumption is analyzed by combining the resource occupancy ratio and resource optimization constraints to obtain positive and negative reward feedback values. If the reward feedback value is positive and exceeds the dynamic adjustment threshold, the current area is determined to be a suspected defect area. Based on the positive and negative reward feedback values and the reward weight allocation rules, the parameters of the action selection rules are adjusted, and the reward feedback cycle is updated using environmental adaptability correction to obtain the optimized window adjustment mechanism. The window adjustment mechanism is used to perform window translation or scaling operations on subsequent road surface image data. The rhythm of feature extraction and fusion is controlled by the state update frequency to obtain a continuous defect localization sequence. The confidence score and edge contour information of the suspected defect area are extracted from the defect location sequence to determine the specific location and type of the crack, and the detection results of the micro-cracks on the road surface are output.

2. The automated road surface defect detection method based on deep reinforcement learning according to claim 1, characterized in that, The process of acquiring raw image data of the current road surface, vehicle speed information, and historical defect detection records, fusing the grayscale distribution features, speed information, and historical records of the data to generate an initial road surface environment description includes: Grayscale features are extracted from the original image data and standardized to obtain standardized image feature data. Combined with driving speed information, if the speed information exceeds the preset speed range, the image feature data is dynamically corrected to determine the corrected feature dataset. The feature dataset is matched with historical defect detection records. If similar defect patterns exist in the historical records, the corresponding defect features are extracted to identify potential road surface problem areas. Based on the potential road surface problem areas, the support vector machine algorithm is used to classify the road surface environment according to the matching results, and the classified environmental state description is obtained. By combining the environmental state description with the timestamps and speed information from image acquisition, a time-series analysis of the dynamic changes in the road surface environment is performed to determine the final comprehensive description of the road surface environment. Obtain the final comprehensive description of the road surface environment, and construct a multi-dimensional road surface condition profile based on the classification results and time series analysis data to obtain complete road surface environment assessment data.

3. The automated road surface defect detection method based on deep reinforcement learning according to claim 1, characterized in that, The process involves extracting local texture details and edge contour information from the original image based on the initial road surface environment description, combining this with speed information to analyze the impact of the dynamic environment, and obtaining a comprehensive environmental feature representation, including: The original image data of the road surface environment is acquired, and the original image is divided into regions using a preset image segmentation method to obtain multiple local image regions; For each local image region, edge contour information is extracted to determine the region with high edge clarity where the gradient changes drastically and the contour is obvious. Based on the analysis of texture distribution in this region, regions with fluctuation amplitude greater than a preset threshold and non-smoothness are extracted as local image patches with high texture complexity containing details of fine cracks or pits. The speed data of the vehicle during operation is acquired. If the speed data exceeds a preset threshold range, the local image block is dynamically corrected to obtain the corrected image block. By extracting features from the corrected image blocks and analyzing the impact of dynamic environmental changes in conjunction with velocity data, if the degree of environmental change is higher than the preset standard, a secondary feature enhancement process is performed on the corrected image blocks to obtain the final environmental feature data.

4. The automated road surface defect detection method based on deep reinforcement learning according to claim 1, characterized in that, The process involves analyzing pixel spatial correlation and regional contrast differences based on comprehensive environmental features. When the defect confidence score within the detection window is lower than a preset threshold, the window is translated or scaled according to pre-established action selection rules to obtain an adjusted local image region, including: By extracting environmental features from the input image data, and using a convolutional neural network to analyze the overall structure of the image, a preliminary environmental feature distribution map is generated. For the aforementioned environmental feature distribution map, the correlation degree within the pixel space is analyzed, the correlation strength between adjacent pixels is calculated, and the correlation distribution result of the pixel space is determined. Based on the correlation distribution results in pixel space and combined with the differences in regional contrast, multiple detection windows are divided, and contrast difference data within each detection window is obtained. For the contrast difference data within each detection window, calculate the defect confidence score for the covered area. If the defect confidence score is lower than the preset score threshold, trigger the subsequent action selection rule to determine whether to perform an adjustment operation. According to the action selection rules, window translation or scaling operations are performed on the detection windows that meet the conditions to adjust the coverage area and obtain the adjusted local image region.

5. The automated road surface defect detection method based on deep reinforcement learning according to claim 1, characterized in that, The process of extracting new edge contour information and local texture details from the adjusted local image region, fusing them with the comprehensive environmental feature representation, and filtering out environmental noise to obtain an updated road surface state description includes: Initial data is obtained from the adjusted local image region, and separation processing is performed on edge contours and texture details to obtain preliminary image structure information; Based on the image structure information, a feature vector construction method is used to convert edge contours and texture details into computable feature vector data, and to determine the quantization representation for subsequent processing. By matching feature vector data with a pre-established environmental feature library and fusing local information with the overall environmental representation, a comprehensive feature description is obtained. If environmental noise interference is detected in the comprehensive feature description, the data is cleaned according to the preset noise filtering rules to obtain the purified feature information. Based on the purified feature information, the road surface condition is classified and mapped, and the support vector machine algorithm is used to divide the condition categories to obtain specific condition labels; By comparing the status identifier with the preset status description template, the final road surface status description information is generated, and the complete analysis results are determined. If there is a classification bias in the final road surface condition description information, the adjusted condition description content is obtained by backtracking the feature vector data for local correction.

6. The automated road surface defect detection method based on deep reinforcement learning according to claim 1, characterized in that, The process involves evaluating the deviation between the defect boundary and the actual location using positioning accuracy assessment, analyzing resource consumption based on resource occupancy ratio and resource optimization constraints, and obtaining positive and negative reward feedback values. If the reward feedback value is positive and exceeds the dynamic adjustment threshold, the current area is determined to be a suspected defect area, including: The system acquires updated road surface condition data, performs preliminary processing on the collected images and sensor information, separates potential defect areas using segmentation techniques, and obtains preliminary defect boundary information. Based on the defect boundary information, the deviation between the defect boundary and the actual position is calculated using positioning accuracy assessment, and compared with the preset deviation range to determine whether the deviation is within an acceptable range. Based on the results of the deviation calculation and combined with the resource occupancy ratio data, the current resource consumption situation in the region is analyzed. By optimizing the constraints to limit resource allocation, the assessment results of resource consumption are obtained. Based on the assessment results of resource consumption, positive and negative reward feedback values are calculated. If the reward feedback value is greater than zero and exceeds the dynamically adjusted threshold setting, the current area is determined to be a suspected defect area.

7. The automated road surface defect detection method based on deep reinforcement learning according to claim 1, characterized in that, The optimized window adjustment mechanism, which adjusts the parameters of the action selection rule based on positive and negative reward feedback values and reward weight allocation rules, and updates the reward feedback cycle using environmental adaptability correction, includes: Acquire reward feedback value data, classify positive and negative feedback values, and determine the degree of deviation of feedback values by comparing them with a preset threshold range to obtain a set of classified feedback values; Based on the set of feedback values and the weighting rules, the weight ratio corresponding to each feedback value is calculated, and the initial adjustment direction of the action selection parameters is determined through weighted processing. Based on the initial adjustment direction, analyze the environmental adaptability data, extract the current environmental change characteristics, and determine if the environmental change exceeds the preset range. Then, correct the adjustment direction to obtain the corrected parameter adjustment scheme. By adjusting the revised parameters, updating the action selection parameters, synchronously recording the updated parameter status, obtaining the system response data after the parameter update, and determining the stability index of the response data; Based on the stability index, the applicability of the reward feedback cycle is evaluated. If the stability index is lower than the preset standard, the feedback cycle is shortened; otherwise, the cycle is extended, resulting in an adjusted feedback cycle configuration. By using the aforementioned feedback cycle configuration and combining it with the window adjustment mechanism, the window size is dynamically updated. By monitoring the system's operating status in real time, it is determined whether the window adjustment meets expectations, and the optimized window adjustment result is obtained. Based on the optimized window adjustment results, new reward feedback value data are continuously collected, and the feedback value analysis and parameter adjustment process is executed cyclically to determine the system's adaptability under different environments.

8. The automated road surface defect detection method based on deep reinforcement learning according to claim 1, characterized in that... The aforementioned window adjustment mechanism is used to perform window translation or scaling operations on subsequent road surface image data. The rhythm of feature extraction and fusion is controlled by the state update frequency to obtain a continuous defect localization sequence. This includes: Raw image data is acquired from the road image acquisition device, and each frame of the image is preprocessed to remove noise interference and obtain clear initial image data. An optimized window adjustment mechanism is used to perform window translation operations on the initial image data, covering different regions of the image and determining the local detail information within each window region; By performing multi-scale analysis of local detail information through window scaling operations, image features at different scales are extracted to obtain a multi-level feature set. The feature set is integrated in a timely manner according to the state update frequency to control the rhythm of feature extraction and fusion. If the feature set meets the preset integrity condition, the convolutional neural network algorithm is applied to fuse the integrated feature set to generate a unified feature expression and determine the preliminary location result of the defect area. By using a continuous sequence generation mechanism, the preliminary localization results are arranged in chronological order, and combined with a rhythm control strategy, a smooth and continuous defect localization sequence is obtained. Based on the defect location sequence, the image data is labeled to generate structured location information output. If the location information meets the preset accuracy standard, the processing flow is completed.

9. The automated road surface defect detection method based on deep reinforcement learning according to claim 1, characterized in that, The process of extracting confidence scores and edge contour information of suspected defect areas from the defect localization sequence, determining the specific location and type of cracks, and outputting the detection results of micro-cracks on the road surface includes: Defect location data is obtained from continuous sequences, and suspected areas are initially screened. Image processing technology is used to separate areas that may have cracks, resulting in preliminary defect candidate areas. For the initially identified defect candidate areas, confidence score data is extracted, filtered by a preset threshold, areas with low scores are removed, and high-confidence suspected crack areas are identified. Edge contour information is extracted from suspected crack areas with high confidence, and geometric analysis methods are used to classify the contour shapes to determine the specific morphological characteristics of the cracks. Based on the specific morphological characteristics of the cracks and combined with crack location data, the distribution of cracks on the road surface is analyzed to obtain the spatial distribution pattern of the cracks. For the spatial distribution pattern of cracks, if the distribution is concentrated in a specific area, further crack type information is extracted to determine whether the cracks are micro-cracks or other categories. By integrating the detection data of micro-cracks based on crack type information and location analysis results, the final detection conclusion is determined.

10. An automated road surface defect detection system based on deep reinforcement learning, characterized in that, The system includes: The multi-source data acquisition and fusion module is used to acquire the original image data collected on the current road surface, vehicle speed information, and historical defect detection records. It also uses multi-source data fusion technology to initially integrate the grayscale distribution features of the images with speed information and historical records to obtain an initial road surface environment description. The feature extraction and environment analysis module is used to extract local texture details and edge contour information from the original image using image processing technology, based on the initial road surface environment description, and to analyze the impact of dynamic environment by combining speed information to obtain a comprehensive environmental feature representation. The window adjustment module is used to analyze pixel spatial correlation and regional contrast differences based on the comprehensive environmental feature representation. When the defect confidence score of the detection window coverage area is lower than the preset threshold, the window translation or scaling operation is performed through the pre-established action selection rules to obtain the adjusted local image area. The state update module is used to extract new edge contour information and local texture details from the adjusted local image region, merge them with the comprehensive environmental feature representation through feature vector fusion technology, and filter environmental noise to obtain the updated road surface state description. The reward evaluation and judgment module is used to evaluate the deviation between the defect boundary and the actual location based on the updated road surface condition description, and to analyze and calculate resource consumption by combining the resource occupancy ratio and resource optimization constraints to obtain positive and negative reward feedback values. When the reward feedback value is positive and exceeds the dynamically adjusted threshold, the current area is determined to be a suspected defect area. The learning optimization module is used to adjust the parameters of the action selection rule based on the positive and negative reward feedback values and the reward weight allocation rules, and to update the reward feedback cycle using environmental adaptability correction to obtain the optimized window adjustment mechanism. The defect location sequence generation module is used to perform window translation or scaling operations on the subsequently acquired road surface image data using an optimized window adjustment mechanism. The rhythm of feature extraction and fusion is controlled by the state update frequency to obtain a continuous defect location sequence. The defect identification and output module is used to extract the defect confidence score and edge contour information of suspected defect areas from a continuous defect location sequence, determine the specific location and type of the final crack, and output the detection results of micro-cracks on the road surface.