Intelligent road recognition system based on deep learning
By deploying image acquisition devices and a dual-branch decoupled neural network on the roadside, road damage identification and speed limit control are performed directly at the roadside edge computing nodes, solving the problem of traffic control lag caused by network latency in existing technologies and realizing real-time speed limit control and safety linkage.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- CHONGQING UNIV
- Filing Date
- 2026-05-21
- Publication Date
- 2026-06-19
Smart Images

Figure CN122244776A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of traffic control technology and discloses an intelligent road recognition system based on deep learning. Background Technology
[0002] Existing intelligent road recognition and traffic control systems generally adopt a cloud-based centralized processing architecture. After roadside sensing devices acquire images of the road surface, they transmit the complete image data back to a remote cloud server via wired or wireless networks. The cloud server deploys a single fully convolutional neural network model to perform end-to-end inference calculations on the entire road image, simultaneously outputting the pixel coordinates of road structure boundaries and classification results for abnormal road surface damage areas. Based on the anomaly classification results, the cloud server matches a preset speed limit strategy, generates traffic control commands, and sends the control commands to roadside traffic signal controllers and variable message signs via downlink communication links. Upon receiving the commands, the roadside execution devices change the displayed speed limit values, completing the closed loop of road anomaly recognition and traffic control.
[0003] In the implementation of the aforementioned existing technical solutions, road image data is transmitted uplink from the roadside acquisition nodes to the cloud server, and then the cloud server transmits control commands downlink to the roadside execution nodes. This round-trip data interaction link is limited by network bandwidth fluctuations and the processing mechanism of the transmission protocol stack, resulting in unavoidable transmission delays. When abnormal conditions such as potholes or water accumulation occur on the road, there is a time difference between the roadside sensing device capturing the abnormal image and the roadside execution device updating the speed limit value. This causes the issuance of roadside traffic control commands to lag behind the actual occurrence of the abnormal road condition, making it impossible to complete the real-time synchronization of control commands and dynamic speed limit linkage before vehicles reach the abnormal road section. Summary of the Invention
[0004] The purpose of this invention is to provide an intelligent road recognition system based on deep learning, which can effectively solve the problems mentioned in the background art.
[0005] To achieve the above objectives, the technical solution adopted by the present invention is as follows: a deep learning-based intelligent road recognition system, including an image acquisition device deployed at a roadside edge computing node, a dual-branch decoupled neural network, a roadside traffic control signal controller, a roadside variable message sign, and a roadside communication unit; The image acquisition device is used to acquire road surface image sequences and input the road surface image sequences into the dual-branch decoupled neural network; The dual-branch decoupled neural network includes a structural feature extraction path and an anomaly feature extraction path. The structural feature extraction path uses a spatial attention mechanism to extract structural feature vectors of road boundaries and lane markings. The anomaly feature extraction path uses a frequency domain filtering convolutional layer to extract anomaly feature vectors of cracks, potholes, and water accumulation areas. The roadside edge computing node compares the abnormal feature vector with a preset road damage level mapping table to generate road damage state features. The roadside traffic control signal receiver receives the road damage status characteristics, determines the speed limit threshold for the corresponding road segment based on the road damage status characteristics, and generates a speed limit control signal. The roadside variable information sign receives the speed limit control signal and displays the corresponding speed limit value, and the roadside communication unit broadcasts the speed limit control signal and the road damage status characteristics to connected vehicles entering the corresponding road section.
[0006] Preferably, the image acquisition device includes a line scan camera and a fill light triggering circuit, wherein the fill light triggering circuit synchronously triggers a strobe fill light according to the line frequency signal of the line scan camera; The line scan camera acquires continuous rows of images of the road surface under the illumination of a strobe lamp and stitches them together to form the road surface image sequence. After receiving the road surface image sequence, the roadside edge computing node performs image preprocessing operations, which include performing affine transformation correction on the road surface image sequence based on the road surface texture direction and removing motion blur regions based on the difference between adjacent frames to obtain a denoised and corrected image sequence. The denoised and corrected image sequence is input into the dual-branch decoupled neural network for feature extraction, so as to eliminate the interference of acquisition distortion and motion blur caused by the high speed of the vehicle on the subsequent abnormal feature extraction.
[0007] Preferably, the structural feature extraction path includes a backbone feature extraction layer, a spatial attention generation layer, and a feature fusion output layer connected in sequence; The backbone feature extraction layer performs multi-scale convolution operations on the road surface image sequence to extract a primary feature map containing road geometric information. The spatial attention generation layer performs max pooling and average pooling on the primary feature map in the channel dimension, concatenates them, and generates a spatial attention weight matrix through a convolution kernel. The feature fusion output layer multiplies the spatial attention weight matrix element-wise with the primary feature map, highlighting the response values of the pixel regions where the road edges and lane markings are located, suppressing the response values of the unstructured background regions of the road surface, and outputting the structural feature vector, thereby maintaining a stable representation of the road topology in scenarios with weak texture or worn lane markings.
[0008] Preferably, the abnormal feature extraction path includes a frequency domain transformation convolutional layer and a spatial deconvolutional layer; The frequency domain transformation convolutional layer has built-in learnable frequency domain filtering parameters. The frequency domain transformation convolutional layer transforms the road surface image sequence from the spatial domain to the frequency domain. It uses the frequency domain filtering parameters to suppress the low-frequency road surface base signal in the frequency domain feature map, enhance the high-frequency abnormal damage signal, and transforms the processed frequency domain feature map back to the spatial domain to obtain the high-frequency abnormal response map. The spatial deconvolution layer upsamples the high-frequency anomaly response map to restore it to the same resolution as the road surface image sequence, outputs the anomaly feature vector through pixel-level classification, and filters out pseudo-anomaly interference from normal road surface texture through a frequency domain filtering mechanism.
[0009] Preferably, the road damage level mapping table stored inside the roadside edge computing node includes a multi-dimensional feature space partition boundary and a corresponding damage level label; During the process of generating the road damage state features, the roadside edge computing node calculates the Mahalanobis distance of the abnormal feature vector in the multidimensional feature space, and determines the initial damage level label based on the division boundary into which the Mahalanobis distance falls. Meanwhile, the roadside edge computing node extracts the lane centerline offset from the structural feature vector. When the lane centerline offset exceeds the preset safe offset range, the initial damage level label is upgraded across levels. The spatial structural deformation index is then fused to generate the final road damage state feature, thus solving the problem of misjudgment of damage level caused by relying solely on abnormal appearance features.
[0010] Preferably, the roadside traffic control signal controller embeds speed limit lookup table logic and smooth transition control logic based on a vehicle dynamics model; The speed limit lookup table logic matches the corresponding basic speed limit threshold based on the damage level label in the received road damage state features. The smooth transition control logic acquires the historical speed limit control signal of the current road segment and the current average vehicle speed, calculates the speed difference between the basic speed limit threshold and the current average vehicle speed, and when the speed difference is greater than the preset deceleration margin, the roadside traffic control signal generator generates the speed limit control signal containing multiple levels of transition speed gradients to avoid the risk of loss of control of the vehicle braking system caused by sudden changes in the speed limit threshold.
[0011] Preferably, the image preprocessing operation further includes a shadow removal step based on local contrast adaptation; In the shadow removal step, the roadside edge computing node converts the denoised and corrected image sequence to the HSV color space and extracts the luminance channel components; For the brightness channel component, calculate the mean and variance of gray levels within the local window, and construct a local adaptive gain coefficient based on the variance of gray levels; The local adaptive gain coefficient is used to perform a nonlinear stretching transformation on the brightness channel component to suppress non-uniform shadow areas caused by tree occlusion or bridge projection, and output an image sequence with balanced illumination. The illumination-equalized image sequence serves as the final input to the dual-branch decoupled neural network, addressing the technical problem of feature overwhelmment in abnormal regions caused by abrupt changes in ambient illumination.
[0012] Preferably, the feature fusion output layer is further connected to a time-series consistency constraint; During the extraction of the structural feature vector, the temporal consistency constraint extracts the spatial attention weight matrix of the current frame and the historical spatial attention weight matrix of the previous frame, and calculates the optical flow guided displacement field between the two. The temporal consistency constraint performs spatial distortion alignment on the historical spatial attention weight matrix based on the optical flow guided displacement field, and calculates the Euclidean distance loss between the distorted alignment result and the spatial attention weight matrix of the current frame. The Euclidean distance loss is backpropagated as a penalty term to the backbone feature extraction layer, constraining the temporal stability of road edge and lane marking features in consecutive multi-frame images, and reducing the interference of flicker noise on road structure contour recognition.
[0013] Preferably, the learnable frequency domain filtering parameters inside the frequency domain transformation convolutional layer are updated end-to-end using a two-dimensional Fourier transform kernel and a two-dimensional inverse Fourier transform kernel. During the forward propagation of the frequency domain transformation convolutional layer, the two-dimensional Fourier transform kernel performs spectral decomposition on the input feature map to generate amplitude spectrum and phase spectrum; The learnable frequency domain filtering parameters are constructed as a ring bandpass filter bank in polar coordinates. The ring bandpass filter bank performs weighted suppression or pass operations on different frequency bands of the amplitude spectrum based on the radial frequency difference between the cracks and pits in the frequency domain. The processed amplitude spectrum is merged with the original phase spectrum and then input into the two-dimensional inverse Fourier transform kernel to restore the spatial domain features, thereby realizing directional frequency domain selection and feature enhancement of road damage at different physical scales.
[0014] Preferably, the speed limit control signal generated by the roadside traffic control signal controller, which includes a multi-level transition speed gradient, is sent to the roadside communication unit via a V2X direct communication interface, and further broadcast by the roadside communication unit to the connected vehicles in the form of a roadside coordination message; After parsing the roadside coordination message, the on-board terminal of the connected vehicle extracts the multi-level transition speed gradient and the corresponding spatial position coordinates. Combined with the current longitudinal acceleration and braking distance of the connected vehicle, it plans the optimal deceleration curve that follows the multi-level transition speed gradient. When the braking distance required for the calculation of the optimal deceleration curve exceeds the remaining distance from the current location of the connected vehicle to the abnormal road section, the on-board terminal triggers an emergency braking takeover strategy to compensate for the safety loophole of the driver's delayed response to multi-level transition speed gradients.
[0015] Compared with the prior art, the beneficial effects of the present invention are as follows: 1. This invention deploys a dual-branch decoupled neural network at roadside edge computing nodes, enabling both road anomaly identification and dynamic speed limit coordination control to be completed locally on the roadside. This eliminates the need for round-trip transmission links between the roadside and the cloud for image data and control commands, thus eliminating transmission delays caused by network bandwidth and protocol stack processing. The roadside edge computing nodes directly generate road damage state features based on the mapping table between anomaly feature vectors and road damage levels. The roadside traffic control signal controller determines the speed limit threshold based on these features and generates a speed limit control signal containing multi-level transition speed gradients. This achieves physical location isomorphism between local road damage feature extraction and roadside speed limit control command generation, allowing roadside variable message signs and connected vehicles to instantly obtain speed limit control signals synchronized with the actual road anomaly state.
[0016] 2. This invention decouples the structural feature extraction path from the abnormal feature extraction path, utilizes a spatial attention mechanism to suppress unstructured background interference, and employs a frequency domain filtering convolutional layer to suppress low-frequency road surface base signals and enhance high-frequency abnormal damage signals. This separates the feature representation of road geometry and surface defects, reducing the computational power consumption of roadside edge computing nodes. By combining linear array camera synchronous triggering, affine transformation correction, and shadow elimination preprocessing, along with a temporal consistency constraint and a ring bandpass filter bank, interference from motion blur, sudden illumination changes, and flicker noise is eliminated, maintaining the stability of feature extraction across multiple consecutive frames. By introducing lane centerline offset to adjust the initial damage level across levels, and combining the connected vehicle's optimal deceleration curve planning based on remaining distance with the emergency braking takeover strategy, this invention compensates for the shortcomings of misjudgment based on single appearance features and the safety loophole of driver response lag. Attached Figure Description
[0017] Figure 1 This is a flowchart of the overall system operation provided in the embodiments of the present invention.
[0018] Figure 2 This is a flowchart of image acquisition and preprocessing provided in an embodiment of the present invention.
[0019] Figure 3This is a flowchart of the structural feature extraction path provided in an embodiment of the present invention.
[0020] Figure 4 This is a flowchart of the abnormal feature extraction path provided in the embodiments of the present invention.
[0021] Figure 5 This is a flowchart of the damage state feature generation process provided in the embodiments of the present invention. Detailed Implementation
[0022] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.
[0023] Please refer to Figure 1 This embodiment provides a deep learning-based intelligent road recognition system deployed on the roadside infrastructure of an urban expressway. The system includes image acquisition equipment deployed at roadside edge computing nodes, a dual-branch decoupled neural network, roadside traffic control signals, roadside variable message signs, and a roadside communication unit. The image acquisition equipment is installed on roadside poles, with a frame rate of 25 frames per second. Each frame has a pixel resolution of 4096×2160 pixels, and the image format is 8-bit grayscale single-channel. The overlap of the road surface coverage area between adjacent frames is no less than 35%, ensuring complete continuity of road surface information between consecutive frames. The image acquisition equipment transmits the acquired road surface image sequence to the memory space of the roadside edge computing nodes in real time via a gigabit Ethernet interface. The roadside edge computing nodes utilize an embedded computing platform equipped with an NVIDIA Jetson AGX Orin module to meet the computing power requirements for real-time image processing and neural network inference.
[0024] The input of the two-branch decoupled neural network is of size Road surface images, in which Image height, Image width, In this embodiment, the number of channels of the input image is... =2160, =4096, =1. The structural feature extraction path and the anomaly feature extraction path share the input layer. The input layer performs pixel value normalization on the input image, mapping grayscale pixel values of 0-255 to the [-1,1] interval. The normalization calculation formula is:
[0025] in, The original pixel value at that location, This represents the pixel value at the corresponding coordinates after normalization. The range of values is , The range of values is .
[0026] Please refer to Figure 3 The structural feature extraction path employs a spatial attention mechanism to extract structural feature vectors from road boundaries and lane markings. First, it performs multi-scale convolution operations on the normalized input image to generate a primary feature map. The multi-scale convolution operation is implemented in parallel using three sets of dilated convolutional layers with different dilation coefficients of 1, 3, and 5. All convolutional layers have a kernel size of 3×3, constant-width padding, a stride of 1, and use the ReLU activation function. The output calculation formula for dilated convolution is as follows:
[0027] in, For dilated convolution in the output feature map coordinates The response value at that location, Let be the weight values of the 3×3 convolution kernel at coordinates (m,n). is the dilation coefficient of dilated convolution. The range of values is , The range of values is .
[0028] The output feature maps of three dilated convolutional layers with different dilation coefficients are concatenated along the channel dimension to generate a primary feature map with 768 channels. Then, a 1×1 convolutional layer is used for channel dimensionality reduction to generate a primary feature map with 256 channels. The size is ,in , Spatial attention generation layer on primary feature maps Perform global max pooling and global average pooling operations at the channel level. Given the set of real numbers, generate two sets of size . The two-dimensional feature maps are the max pooling feature maps. With average pooling feature map .
[0029] The formula for calculating global max pooling in the channel dimension is:
[0030] in, For max pooling feature maps in coordinates The response value at that location, The range of values is , The range of values is , This is the channel index of the feature map.
[0031] The formula for calculating global average pooling at the channel dimension is:
[0032] in, For the average pooling feature map in coordinates The response value at that location.
[0033] Will and The data is stitched together along the channel dimension to generate a size of [size missing]. splicing feature map The calculation formula for the splicing operation is:
[0034] in, To stitch feature maps in coordinates The channel dimension vector at that location.
[0035] splicing feature map The input is fed into a 7×7 convolutional layer with 1 output channel, constant padding, a stride of 1, and a generated size of [missing information]. Spatial attention weight matrix The output of the convolutional layer is normalized by the Sigmoid activation function, which maps the weight values to the [0,1] interval. The formula for calculating the Sigmoid activation function is:
[0036] in, The output of the linear operation of a 7×7 convolutional layer. represents the weight values of the spatial attention weight matrix at coordinates (x, y).
[0037] The feature fusion output layer will use the spatial attention weight matrix With primary feature map Element-wise multiplication is performed to highlight the response values of the pixel areas containing road edges and lane markings, while suppressing the response values of unstructured background areas of the road surface. The formula for element-wise multiplication is as follows:
[0038] in, The feature map after spatial attention weighting in coordinates The response value at that location.
[0039] Weighted feature map Perform global average pooling to average the response values of all spatial locations for each channel in the spatial dimension, generating a dimension of... One-dimensional structural feature vector The formula for calculating global average pooling is:
[0040] in, For the structural feature vector at the th The values of each channel, The range of values is .
[0041] Please refer to Figure 4 The anomaly feature extraction path uses a frequency domain filtering convolutional layer to extract anomaly feature vectors for cracks, pits, and water accumulation areas. The input to the anomaly feature extraction path is also a normalized input image. The size is The input first enters the frequency domain transformation convolutional layer. This layer has built-in learnable frequency domain filtering parameters, which transform the input image from the spatial domain to the frequency domain. The formula for calculating the two-dimensional discrete Fourier transform is:
[0042] in, Complex values at that location, This is the frequency index in the vertical direction of the frequency domain, with a value range of [value range missing]. , This is the horizontal frequency index in the frequency domain, with a value range of [0, W-1]. It is the imaginary unit.
[0043] The frequency domain transformation convolutional layer converts the input spatial domain image into a frequency domain feature map, and then decomposes the complex-valued frequency domain feature map into an amplitude spectrum.
[0044]
[0045]
[0046] in, Let be the real part of the complex value of the frequency domain feature map. For the imaginary part of the complex values of the frequency domain feature map, amplitude at The phase value at that point.
[0047] The frequency domain transformation convolutional layer incorporates learnable frequency domain filtering parameters to weight the amplitude spectrum. These learnable frequency domain filtering parameters are two-dimensional weight matrices. The size is consistent with the amplitude spectrum, which is The initial values of the weight matrix are set as follows: with the center coordinates in the frequency domain as the coordinates. The region centered at a circle with a radius less than or equal to 64 has an initial weight of 0.2; the region with a radius greater than 64 but less than or equal to 512 has an initial weight of 1.8; and the region with a radius greater than 512 has an initial weight of 0.5. Wherein, the radius... The calculation formula is:
[0048] weight matrix During the training of the neural network, the gradient is updated end-to-end through the backpropagation algorithm. The updated gradient comes from the cross-entropy loss function of the subsequent pixel-level classification task.
[0049] The amplitude spectrum is weighted, and the weighted amplitude spectrum is obtained. The calculation formula is:
[0050] The weighted amplitude spectrum is compared with the original phase spectrum. Merge to generate the processed frequency domain feature map The merging formula is:
[0051] The frequency domain transformation convolutional layer performs a two-dimensional inverse discrete Fourier transform on the processed frequency domain feature map, transforming it back to the spatial domain to obtain the high-frequency anomaly response map. The formula for calculating the two-dimensional inverse discrete Fourier transform is:
[0052] in, The response value of the high-frequency anomaly response map at spatial coordinates (x, y) is taken as the real part of the complex value as the final spatial domain response value.
[0053] Spatial deconvolution layer response to high-frequency anomalies Upsampling is performed to restore the resolution to match the input road surface image sequence. The spatial deconvolution layer uses two sets of deconvolution operations. The first set of deconvolution layers has a 4×4 kernel size, a stride of 2, padding of 1, 128 output channels, and uses ReLU activation. The second set of deconvolution layers also has a 4×4 kernel size, a stride of 2, padding of 1, 4 output channels, and uses Softmax activation. The output feature map size is the same as the input image. The four channels correspond to the pixel-level classification probabilities of four categories: background, cracks, pits, and water accumulation.
[0054] The output pixel-level classification probability map is processed to extract the probability response for each category and generate anomaly feature vectors. Specifically, for each anomaly category, including cracks, potholes, and water accumulation, the probability map is as follows: The size is The values range from [0,1]. k=1 corresponds to cracks, k=2 corresponds to pits, and k=3 corresponds to water accumulation. Four statistics are calculated: global average probability, maximum probability, percentage of abnormal area, and aspect ratio of abnormal area. Each category has four statistics, for a total of 12 statistics across the three categories, forming a one-dimensional anomaly feature vector with dimension 12. .
[0055] The formula for calculating the global average probability is:
[0056] The formula for calculating the maximum probability is:
[0057] The formula for calculating the proportion of abnormal area is:
[0058] in, This is an indicator function that takes the value 1 when the condition within the parentheses is true, and 0 otherwise.
[0059] The process of calculating the aspect ratio of the abnormal region is as follows: [This is done on the probability map.] Binarization is performed with a threshold of 0.5 to generate a binarized mask image. Perform connected component analysis on the mask image to extract all connected components with an area greater than 100 pixels. For each connected component, calculate the width of its minimum bounding rectangle. With height The aspect ratio of this connected component is The maximum aspect ratio of all connected components is taken as the aspect ratio statistic for that category. If no connected component meets the conditions, then .
[0060] The training process of the dual-branch decoupled neural network adopts a multi-task learning framework. The training dataset contains 1 million frames of road surface images, covering images with different lighting conditions, road surface types, damage types, and damage levels. Each frame is annotated at the pixel level, including the coordinates of road edges and lane markings, as well as pixel-level masks for cracks, potholes, and water accumulation areas. The main training tasks include road structure segmentation and road surface anomaly segmentation. The Dice loss function is used for road structure segmentation, and the cross-entropy loss function is used for road surface anomaly segmentation. The total loss function is a weighted sum of the two task loss functions, with weights of 0.4 and 0.6, respectively. The AdamW optimizer is used for training, with an initial learning rate of 10. -4 The weight decays to 10. -5 The batch size is 16, the training epochs are 100, and a cosine annealing learning rate adjustment strategy is used, adjusting the learning rate once per epoch. An early stopping strategy is employed during training: training stops when the loss function on the validation set does not decrease for 10 consecutive epochs, and the optimal model weights are saved. The model weights are quantized in INT8 format and deployed on an embedded platform of roadside edge computing nodes. The inference time for a single frame image does not exceed 50ms.
[0061] Please refer to Figure 5 The roadside edge computing node compares the abnormal feature vector with the preset road damage level mapping table to generate road damage state features. The non-volatile storage medium of the roadside edge computing node stores the road damage level mapping table in advance. The mapping table contains the multi-dimensional feature space division boundary and the corresponding damage level label. The damage level label is divided into 5 levels, namely L0 (no damage), L1 (minor damage), L2 (moderate damage), L3 (severe damage), and L4 (extremely severe damage). The specific division rules of the mapping table are shown in Table 1.
[0062] Table 1. Multidimensional Feature Space Boundary of Road Damage Level Mapping Table ; Table 1 clarifies the multidimensional feature space partitioning boundary of the road damage level mapping table. For the 12 dimensions of statistics of the abnormal feature vectors, the feature value range corresponding to each damage level is defined. The roadside edge calculation node determines the partitioning boundary it falls into by calculating the Mahalanobis distance of the abnormal feature vector in the multidimensional feature space and matches the corresponding initial damage level label. The Mahalanobis distance is used to calculate the distance between the abnormal feature vector and the feature center corresponding to each damage level, eliminating the influence of dimensional differences and correlations between different feature dimensions. The formula for calculating the Mahalanobis distance is:
[0063] in, Anomaly feature vector The feature center vector corresponding to the c-th damage level Mahalanobis distance between them Let c be the feature covariance matrix corresponding to the c-th damage level. is the inverse of the covariance matrix, and the value of c ranges from [0,4], corresponding to the five damage levels from L0 to L4.
[0064] The roadside edge computing nodes calculate the Mahalanobis distance between the anomaly feature vector and the feature centers of the five damage levels, selecting the damage level with the smallest Mahalanobis distance as the initial damage level label. Simultaneously, the roadside edge computing nodes extract the lane centerline offset from the structural feature vector and use it to calculate the weighted feature map corresponding to the structural feature vector. In the process, the pixel coordinates of the road edge and lane markings are extracted, and the lane centerline is fitted by Hough linear transformation. The lateral offset between the fitted lane centerline and the preset standard lane centerline is calculated. When the offset exceeds the preset ±0.5 meter safe offset range, the initial damage level label is adjusted upwards by one level. The spatial structural deformation index is fused to generate the final road damage state feature. The road damage state feature includes information in four dimensions: the final damage level label, the abnormal feature vector, the lane centerline offset, and the spatial location coordinates of the abnormal area.
[0065] The roadside traffic control signal receiver receives road damage status characteristics, determines the corresponding speed limit threshold for the road segment based on these characteristics, and generates a speed limit control signal. The roadside traffic control signal receiver uses a 32-bit industrial-grade ARM Cortex-R5F processor, supports multiple I / O interfaces and Ethernet communication interfaces, and establishes a real-time communication link with the roadside edge computing node via Gigabit Ethernet, with a communication cycle of 10ms. The roadside traffic control signal receiver embeds speed limit lookup table logic and smooth transition control logic based on a vehicle dynamics model. The speed limit lookup table logic matches the corresponding basic speed limit threshold based on the damage level label in the received road damage status characteristics: L0 level corresponds to a basic speed limit threshold of 80km / h, L1 level to 70km / h, L2 level to 60km / h, L3 level to 40km / h, and L4 level to 20km / h. The smooth transition control logic obtains the historical speed limit control signal of the current road segment and the current average vehicle speed, calculates the speed difference between the basic speed limit threshold and the current average vehicle speed, and generates a speed limit control signal containing multiple transition speed gradients when the speed difference is greater than the preset 20km / h deceleration margin. The maintenance distance of each gradient is 200 meters.
[0066] The roadside variable information sign receives speed limit control signals and displays the corresponding speed limit values. The roadside variable information sign uses an LED dot matrix display screen with a pixel pitch of 16mm and a display size of 3.2m×2.4m. It supports RS485 communication interface and Ethernet communication interface, receives speed limit control signals sent by the roadside traffic control signal controller, parses them and updates the speed limit values on the display screen, with a refresh cycle of no more than 500ms. The roadside communication unit broadcasts speed limit control signals and road damage status characteristics to connected vehicles entering the corresponding road segment. The roadside communication unit adopts a roadside unit based on the C-V2X PC5 direct communication interface, with an operating frequency band of 5905MHz-5925MHz, a communication bandwidth of 20MHz, a maximum communication distance of 1000 meters, and a broadcast period of 100ms. The broadcast message format conforms to the roadside coordination message format defined by the 3GPP R16 standard. The message content includes multi-level transition speed gradients in the speed limit control signal, damage level labels in the road damage status characteristics, spatial coordinates of abnormal areas, and length and width information of abnormal road segments. Connected vehicles entering the corresponding road segment receive the broadcast message through the on-board OBU and obtain relevant information after parsing.
[0067] The technical solution of this embodiment completes the acquisition of road images, feature extraction, and damage level determination at the roadside edge computing node, and generates and executes speed limit control signals locally on the roadside, eliminating the round-trip transmission delay between the roadside and the cloud, and realizing real-time linkage between road damage identification and traffic control.
[0068] In one optional embodiment, the image acquisition device includes a line scan camera and a supplementary lighting trigger circuit. The line scan camera uses an 8K resolution line scan CCD sensor with a pixel size of 7μm×7μm and a maximum line frequency of 180kHz. It is mounted on a roadside pole at a height of 6.5 meters, with a 22-degree depression angle between its optical axis and the road plane. The field of view depth along the road travel direction is 30 meters, and the lateral field of view covers three lanes in one direction, with a total width of 11.25 meters. The supplementary lighting trigger circuit uses an FPGA as its core controller. Its input is the line synchronization signal output from the line scan camera. The frequency of the line synchronization signal is consistent with the line frequency of the line scan camera. When the supplementary lighting trigger circuit receives the rising edge of each line synchronization signal, it generates a 5μs wide TTL trigger pulse and outputs it to a strobe light. The strobe light uses a high-power LED array with an emission wavelength of 650nm, a single pulse luminous flux of 12000lm, and a response time of less than 1μs, achieving complete synchronization of exposure with the line frequency of the line scan camera.
[0069] Please refer to Figure 2Under the illumination of a strobe lamp, the line scan camera acquires continuous row images of the road surface. Each row of the image has 8192 pixels, and the row acquisition frequency is 120kHz. Corresponding to a vehicle speed of 120km / h, the sampling resolution along the driving direction is 0.28mm / pixel. The line scan camera stitches the continuously acquired row images according to a preset frame height. Each frame is composed of 2048 consecutive row images, generating a single-frame road surface image with a size of 8192×2048. The number of row overlaps between adjacent frames is 512. The stitched image sequence is transmitted to the roadside edge computing node at a frame rate of 15 frames per second.
[0070] After receiving the road surface image sequence, the roadside edge computing node performs image preprocessing operations. These operations include affine transformation correction based on the road surface texture direction and motion blur region removal based on the difference between adjacent frames, resulting in a denoised and corrected image sequence. Affine transformation correction based on the road surface texture direction is used to eliminate perspective distortion caused by the tilted mounting of the line scan camera. For each acquired single-frame road surface image, the principal direction of the road surface texture is extracted, and the principal direction angle of the road surface texture in the image is determined through gradient orientation histogram statistics. Then, an affine transformation matrix is constructed to perform perspective correction on the image. The formula for calculating the affine transformation matrix is:
[0071] in, It is a 3×3 affine transformation matrix. The angle between the main direction of the road surface texture and the horizontal axis of the image. and This is the translation amount of the image, used to ensure that the center of the corrected image is aligned with the center of the original image.
[0072] For each pixel coordinate in the original image The corresponding coordinates of the object in the corrected image are calculated using the affine transformation matrix. The calculation formula is:
[0073] The bilinear interpolation algorithm is used to calculate the pixel value of each coordinate in the corrected image, eliminating the image stretching and tilting caused by perspective distortion, and obtaining the affine corrected image.
[0074] Motion blur region removal based on the difference between adjacent frames, using the current frame image Compared to the previous frame image After normalizing the grayscale of the two frames, the inter-frame difference image is calculated. The calculation formula is:
[0075] The difference image is binarized with a threshold of 15 to generate a binary mask. Regions with an absolute difference less than the threshold are marked as motion blur candidate regions, while regions with an absolute difference greater than the threshold are marked as sharp regions. Connectivity analysis is performed on the candidate regions to remove scattered regions with an area less than 200 pixels, thus determining the mask for the motion blur region. For the motion blur region, the sharp pixel at the corresponding position in the previous frame is used for replacement, or linear interpolation is performed using pixels from adjacent sharp regions to repair the image, ultimately obtaining the image sequence after removing motion blur.
[0076] Image preprocessing also includes a shadow removal step based on local contrast adaptation. In this step, the roadside edge computing node converts the denoised and corrected image sequence to the HSV color space and extracts the luminance channel components. For each pixel in the RGB image, the R, G, and B components range from [0, 255]. These are first normalized to the [0, 1] interval to obtain... , , Calculate the maximum value minimum value Difference .
[0077] The formula for calculating the luminance component V is:
[0078] Saturation component The calculation formula is:
[0079] The formula for calculating the hue component H is:
[0080] After conversion, the luminance channel component V is extracted, with the same size as the original image. The luminance channel component is then processed, with a local window size of 31×31 and a stride of 1. The coordinates of each pixel in the luminance channel component are then... Calculate the average gray value within the local window centered at that coordinate. With gray variance The calculation formula is:
[0081]
[0082] in, This represents the total number of pixels within the local window. =31×31=961.
[0083] Constructing local adaptive gain coefficients based on gray-level variance The smaller the gray-level variance, the more uniform the gray-level distribution in the area, and the more likely it is to be a shadow area. The corresponding gain coefficient is larger, and the calculation formula is:
[0084] in, Let V be the global grayscale variance of the luminance channel component. This is the gain adjustment factor, with a value of 0.3. To prevent extremely small values where the denominator is 0, the value is set to 10. -8 .
[0085] The luminance channel components are subjected to a nonlinear stretching transformation using local adaptive gain coefficients. The transformation formula is as follows:
[0086] in, The adjusted luminance channel component has its value range limited to the [0,1] interval.
[0087] Adjusted luminance channel components The saturation components are merged and converted back to the RGB color space to obtain an illumination-balanced image sequence, which serves as the final input to the dual-branch decoupled neural network. The core parameter settings and processing effect indicators for each step of the image preprocessing in this embodiment are shown in Table 2.
[0088] Table 2. Parameter settings and processing effect indicators for each step of image preprocessing. ; Table 2 clarifies the core parameter settings and corresponding processing effects of each step in the image preprocessing in this embodiment. Through multi-step preprocessing operations, the interference of acquisition distortion, motion blur and non-uniform shadows is eliminated, and the accuracy of subsequent neural network feature extraction is improved.
[0089] The feature fusion output layer is also connected to a temporal consistency constraint. During the extraction of structural feature vectors, the temporal consistency constraint extracts the spatial attention weight matrix of the current frame and the historical spatial attention weight matrix of the previous frame, and calculates the optical flow-guided displacement field between them. The input of the temporal consistency constraint is the spatial attention weight matrix of the current frame t. Historical spatial attention weight matrix of the previous frame t-1 All sizes The Farneback optical flow algorithm is used to calculate the optical flow field between the weight matrices of two frames. Each pixel position (x, y) in the optical flow field corresponds to a two-dimensional displacement vector. , representing the displacement of the pixel at coordinates (x, y) in the previous frame in the current frame. For each local window in the two frames, assuming the optical flow within the window is constant, the optical flow vector is solved by minimizing the following energy function:
[0090] in, The window weight function uses Gaussian weights. and These are the gradients of the weight matrix in the previous frame in the x and y directions, respectively.
[0091] The obtained optical flow field is used as the optical flow-guided displacement field. The temporal consistency constraint uses this displacement field to apply the historical spatial attention weight matrix of the previous frame. Perform spatial warp alignment to generate an aligned weight matrix. The formula for calculating the warp alignment is:
[0092] The bilinear interpolation algorithm is used to calculate the weight values at non-integer coordinates.
[0093] The temporal consistency constraint calculates the Euclidean distance loss by combining the warped and aligned result with the spatial attention weight matrix of the current frame. This Euclidean distance loss is then backpropagated as a penalty term to the backbone feature extraction layer, constraining the temporal stability of road edge and lane marking features across multiple consecutive frames. (Euclidean distance loss) The calculation formula is:
[0094] The Euclidean distance loss, used as a temporal consistency penalty, is added to the main task loss of the neural network to form the total loss function. This function updates all weight parameters of the neural network through the backpropagation algorithm, reducing the flickering noise of the spatial attention weights between consecutive frames.
[0095] This embodiment achieves clear road surface imaging under high-speed motion by using a line scan camera and a synchronous supplementary lighting trigger circuit. Combined with multi-step image preprocessing operations, it eliminates interference from acquisition distortion, motion blur, and non-uniform shadows. The temporal consistency constraint improves the temporal stability of continuous frame feature extraction and reduces the impact of environmental interference on road recognition and damage assessment.
[0096] In another alternative embodiment, the learnable frequency domain filtering parameters inside the frequency domain transformation convolutional layer are updated end-to-end through a two-dimensional Fourier transform kernel and a two-dimensional inverse Fourier transform kernel. The two-dimensional Fourier transform kernel and the two-dimensional inverse Fourier transform kernel are implemented as differentiable operators of the neural network, supporting end-to-end gradient backpropagation. In the deep learning framework, this is achieved through a differentiable Fourier transform function, which can transfer the gradient from the frequency domain to the input feature map in the spatial domain.
[0097] During forward propagation in the frequency-domain transformed convolutional layer, a two-dimensional Fourier transform kernel performs spectral decomposition on the input feature map, generating amplitude and phase spectra. The learnable frequency-domain filtering parameters are constructed as a ring bandpass filter bank in polar coordinates, with the polar coordinates centered on the frequency domain coordinates. Let r be the pole, and r be the radial coordinate and r be the angular coordinate. The formula for calculating r is:
[0098] Angular coordinates The calculation formula is:
[0099] The ring bandpass filter bank contains 6 ring filters, each corresponding to a radial frequency band. Based on the radial frequency difference of the cracks and craters in the frequency domain, weighted suppression or passing operations are performed on different frequency bands of the amplitude spectrum. The frequency band division and initial weighting parameters of the ring bandpass filter bank are shown in Table 3.
[0100] Table 3. Bandwidth division and initial weighting parameters of the ring bandpass filter bank ; Table 3 clarifies the frequency band division rules, corresponding physical scales of impairments, and initial weighting parameters of the ring bandpass filter bank in this embodiment. Each ring filter corresponds to a learnable weighting coefficient. The weighting coefficients, k=1 to 6, are used for end-to-end gradient updates during the neural network training process via backpropagation. The gradients are derived from the cross-entropy loss function of subsequent pixel-level classification tasks. By using directional weighting across different frequency bands, feature enhancement and background suppression of road damage at different physical scales are achieved.
[0101] For each coordinate in the frequency domain Calculate its corresponding radial coordinate r, determine its frequency band k, and obtain the corresponding weighting coefficients. The amplitude spectrum is weighted, and the weighted amplitude spectrum is obtained. The calculation formula is:
[0102] Where k is the frequency band number to which the coordinates (u,v) belong. The processed amplitude spectrum is merged with the original phase spectrum and then input into a two-dimensional inverse Fourier transform kernel to recover the spatial domain features, realizing directional frequency domain selection and feature enhancement for road damage at different physical scales.
[0103] The spatial deconvolutional layer upsamples the high-frequency anomaly response map to restore it to the same resolution as the road surface image sequence. The spatial deconvolutional layer employs an encoder-decoder structure. The encoder part uses three sets of convolutional layers to downsample the high-frequency anomaly response map and extract multi-scale anomaly features. Each set of convolutional layers contains two 3×3 convolutional layers, one BatchNorm layer, and one ReLU activation layer, with a stride of 2 and channel numbers of 64, 128, and 256 respectively. The decoder part uses three sets of deconvolutional layers to upsample the feature map output by the encoder. Each set of deconvolutional layers contains one 4×4 deconvolutional layer, one BatchNorm layer, and one ReLU activation layer, with a stride of 2 and channel numbers of 128, 64, and 4 respectively. Simultaneously, it fuses the feature map with the corresponding scale feature map from the encoder through skip connections to recover detailed information. The final output feature map is normalized using the Softmax activation function to generate a pixel-level classification probability map. Each pixel corresponds to the probability of four categories: background, cracks, pits, and water accumulation. The calculation formula for the Softmax activation function is as follows:
[0104] in, The k-th channel of the deconvolution layer output in coordinates The linear response values at the given location are: k=0 for background, k=1 for cracks, k=2 for potholes, and k=3 for water accumulation. The probability values for the corresponding categories are used to filter out pseudo-anomalies caused by normal road surface textures through a frequency domain filtering mechanism.
[0105] The roadside traffic control signal controller embeds speed limit lookup logic and smooth transition control logic based on a vehicle dynamics model. The speed limit lookup logic matches the corresponding baseline speed limit threshold to the damage level label in the received road damage status features. The smooth transition control logic obtains the historical speed limit control signal and the current average vehicle speed for the current road segment, and calculates the speed difference between the baseline speed limit threshold and the current average vehicle speed. The roadside traffic control signal controller also obtains the speeds of all vehicles in the current road segment through roadside radar equipment and calculates the average speed of all vehicles. The sampling period is 1 second, and the sliding window length is 5 seconds. The roadside traffic control signal controller reads the currently effective historical speed limit thresholds from local storage. And the basic speed limit threshold matched based on the characteristics of road damage status. Calculate the speed difference The preset deceleration margin is .
[0106] when At that time, the roadside traffic control signal controller directly generates a speed limit control signal, setting the speed limit threshold to... Effective immediately. At that time, the roadside traffic control signal controller generates a speed limit control signal containing multiple levels of transition speed gradients. The formula for calculating the number of levels n of the transition gradient is:
[0107] velocity value of each transition gradient The calculation formula is:
[0108] in, .
[0109] Maintenance distance of each transition gradient The calculation formula is:
[0110] in, , The maximum comfort deceleration of the vehicle is set to a value of This ensures that the vehicle can smoothly decelerate to the corresponding speed value within the maintained distance.
[0111] The speed limit control signal generated by the roadside traffic control signal controller, which includes multi-level transition speed gradients, is sent to the roadside communication unit via the V2X direct communication interface. The roadside communication unit then broadcasts this signal to connected vehicles in the form of a roadside coordination message. After parsing the roadside coordination message, the onboard terminal of the connected vehicle extracts the multi-level transition speed gradients and their corresponding spatial coordinates. Combining this with the connected vehicle's current longitudinal acceleration and braking distance, it plans the optimal deceleration curve that follows the multi-level transition speed gradients.
[0112] The onboard terminal of a connected vehicle receives roadside coordination messages broadcast by the roadside communication unit via the onboard OBU, parses the message content, and extracts multi-level transition speed gradients. Spatial coordinates corresponding to each gradient The starting coordinates of the abnormal region Length of the abnormal region The vehicle-mounted terminal obtains the vehicle's current location coordinates through onboard sensors. Current driving speed Current longitudinal acceleration Maximum braking deceleration of the vehicle The value is .
[0113] Calculate the remaining distance from the vehicle's current position to the starting position of the abnormal area. :
[0114] The vehicle terminal planning follows the optimal deceleration curve of multi-level transition speed gradients, employs a model predictive control algorithm, and aims to minimize the tracking error between the vehicle speed and the transition speed gradient, while simultaneously minimizing the rate of change of deceleration. The objective function for optimization is:
[0115] in, To predict the time domain length, a value of 20 is used. To predict the first in the time domain The speed of the vehicle in steps, This is the reference transition speed for the corresponding step. For the vehicle acceleration corresponding to the step, Let the rate of change of acceleration be... , , These are weighting coefficients, with values of 1.0, 0.5, and 0.1 respectively.
[0116] The constraints are:
[0117] in, The maximum acceleration of the vehicle is given by a value of [value missing]. , and The upper and lower limits of the rate of change of acceleration are respectively set to values of and .
[0118] By solving the above optimization problem, the optimal deceleration curve is obtained. The vehicle's electronic stability control system and engine management system control the vehicle's acceleration and braking based on this deceleration curve, achieving smooth tracking of multi-level transition speed gradients. The onboard terminal calculates the braking distance required to follow the optimal deceleration curve. The calculation formula is:
[0119] in, This represents the average deceleration of the optimal deceleration curve, and its value is negative.
[0120] When the braking distance required for calculating the optimal deceleration curve exceeds the remaining distance from the current location of the connected vehicle to the abnormal road section, the on-board terminal triggers an emergency braking takeover strategy, sending a braking command to the vehicle's automatic emergency braking system to control the vehicle to decelerate at maximum speed. The system applies the brakes and simultaneously issues an audible and visual alarm to the driver via the in-vehicle human-machine interface, reminding the driver to take over the vehicle and compensating for the safety loophole of delayed driver response to multi-level transition speed gradients.
[0121] This embodiment achieves directional feature enhancement of road damage at different scales through a ring bandpass filter bank in polar coordinates, improving the accuracy of abnormal feature extraction. It achieves smooth vehicle deceleration through a speed limit control signal with multi-level transition speed gradients. Combined with the optimal deceleration curve planning of connected vehicles and emergency braking takeover strategy, it improves driving safety in abnormal road scenarios.
Claims
1. A deep learning-based intelligent road recognition system, characterized in that, This includes image acquisition equipment deployed at roadside edge computing nodes, a dual-branch decoupled neural network, roadside traffic control signals, roadside variable message signs, and roadside communication units; The image acquisition device is used to acquire road surface image sequences and input the road surface image sequences into the dual-branch decoupled neural network; The dual-branch decoupled neural network includes a structural feature extraction path and an anomaly feature extraction path. The structural feature extraction path uses a spatial attention mechanism to extract structural feature vectors of road boundaries and lane markings. The anomaly feature extraction path uses a frequency domain filtering convolutional layer to extract anomaly feature vectors of cracks, potholes, and water accumulation areas. The roadside edge computing node compares the abnormal feature vector with a preset road damage level mapping table to generate road damage state features. The roadside traffic control signal receiver receives the road damage status characteristics, determines the speed limit threshold for the corresponding road segment based on the road damage status characteristics, and generates a speed limit control signal. The roadside variable information sign receives the speed limit control signal and displays the corresponding speed limit value, and the roadside communication unit broadcasts the speed limit control signal and the road damage status characteristics to connected vehicles entering the corresponding road section.
2. The intelligent road recognition system based on deep learning according to claim 1, characterized in that, The image acquisition device includes a line scan camera and a fill light triggering circuit. The fill light triggering circuit synchronously triggers a strobe fill light based on the line frequency signal of the line scan camera. The line scan camera acquires continuous rows of images of the road surface under the illumination of a strobe lamp and stitches them together to form the road surface image sequence. After receiving the road surface image sequence, the roadside edge computing node performs image preprocessing operations, which include performing affine transformation correction on the road surface image sequence based on the road surface texture direction and removing motion blur regions based on the difference between adjacent frames to obtain a denoised and corrected image sequence. The denoised and corrected image sequence is input into the dual-branch decoupled neural network for feature extraction, so as to eliminate the interference of acquisition distortion and motion blur caused by the high speed of the vehicle on the subsequent abnormal feature extraction.
3. The intelligent road recognition system based on deep learning according to claim 1, characterized in that, The structural feature extraction path includes a backbone feature extraction layer, a spatial attention generation layer, and a feature fusion output layer connected in sequence. The backbone feature extraction layer performs multi-scale convolution operations on the road surface image sequence to extract a primary feature map containing road geometric information. The spatial attention generation layer performs max pooling and average pooling on the primary feature map in the channel dimension, concatenates them, and generates a spatial attention weight matrix through a convolution kernel. The feature fusion output layer multiplies the spatial attention weight matrix element-wise with the primary feature map, highlighting the response values of the pixel regions where the road edges and lane markings are located, suppressing the response values of the unstructured background regions of the road surface, and outputting the structural feature vector, thereby maintaining a stable representation of the road topology in scenarios with weak texture or worn lane markings.
4. The intelligent road recognition system based on deep learning according to claim 1, characterized in that, The abnormal feature extraction path includes a frequency domain transformation convolutional layer and a spatial deconvolutional layer; The frequency domain transformation convolutional layer has built-in learnable frequency domain filtering parameters. The frequency domain transformation convolutional layer transforms the road surface image sequence from the spatial domain to the frequency domain. It uses the frequency domain filtering parameters to suppress the low-frequency road surface base signal in the frequency domain feature map, enhance the high-frequency abnormal damage signal, and transforms the processed frequency domain feature map back to the spatial domain to obtain the high-frequency abnormal response map. The spatial deconvolution layer upsamples the high-frequency anomaly response map to restore it to the same resolution as the road surface image sequence, outputs the anomaly feature vector through pixel-level classification, and filters out pseudo-anomaly interference from normal road surface texture through a frequency domain filtering mechanism.
5. The intelligent road recognition system based on deep learning according to claim 1, characterized in that, The road damage level mapping table stored inside the roadside edge computing node contains a multi-dimensional feature space partition boundary and a corresponding damage level label. During the process of generating the road damage state features, the roadside edge computing node calculates the Mahalanobis distance of the abnormal feature vector in the multidimensional feature space, and determines the initial damage level label based on the division boundary into which the Mahalanobis distance falls. Meanwhile, the roadside edge computing node extracts the lane centerline offset from the structural feature vector. When the lane centerline offset exceeds the preset safe offset range, the initial damage level label is upgraded across levels. The spatial structural deformation index is then fused to generate the final road damage state feature, thus solving the problem of misjudgment of damage level caused by relying solely on abnormal appearance features.
6. The intelligent road recognition system based on deep learning according to claim 1, characterized in that, The roadside traffic control signal controller is embedded with speed limit lookup table logic and smooth transition control logic based on vehicle dynamics model; The speed limit lookup table logic matches the corresponding basic speed limit threshold based on the damage level label in the received road damage state features. The smooth transition control logic acquires the historical speed limit control signal of the current road segment and the current average vehicle speed, calculates the speed difference between the basic speed limit threshold and the current average vehicle speed, and when the speed difference is greater than the preset deceleration margin, the roadside traffic control signal generator generates the speed limit control signal containing multiple levels of transition speed gradients to avoid the risk of loss of control of the vehicle braking system caused by sudden changes in the speed limit threshold.
7. The intelligent road recognition system based on deep learning according to claim 2, characterized in that, The image preprocessing operation also includes a shadow removal step based on local contrast adaptation; In the shadow removal step, the roadside edge computing node converts the denoised and corrected image sequence to the HSV color space and extracts the luminance channel components; For the brightness channel component, calculate the mean and variance of gray levels within the local window, and construct a local adaptive gain coefficient based on the variance of gray levels; The local adaptive gain coefficient is used to perform a nonlinear stretching transformation on the brightness channel component to suppress non-uniform shadow areas caused by tree occlusion or bridge projection, and output an image sequence with balanced illumination. The illumination-equalized image sequence serves as the final input to the dual-branch decoupled neural network, addressing the technical problem of feature overwhelmment in abnormal regions caused by abrupt changes in ambient illumination.
8. The intelligent road recognition system based on deep learning according to claim 3, characterized in that, The feature fusion output layer is also connected to a time-series consistency constraint. During the extraction of the structural feature vector, the temporal consistency constraint extracts the spatial attention weight matrix of the current frame and the historical spatial attention weight matrix of the previous frame, and calculates the optical flow guided displacement field between the two. The temporal consistency constraint performs spatial distortion alignment on the historical spatial attention weight matrix based on the optical flow guided displacement field, and calculates the Euclidean distance loss between the distorted alignment result and the spatial attention weight matrix of the current frame. The Euclidean distance loss is backpropagated as a penalty term to the backbone feature extraction layer, constraining the temporal stability of road edge and lane marking features in consecutive multi-frame images, and reducing the interference of flicker noise on road structure contour recognition.
9. The intelligent road recognition system based on deep learning according to claim 4, characterized in that, The learnable frequency domain filtering parameters inside the frequency domain transformation convolutional layer are updated end-to-end through a two-dimensional Fourier transform kernel and a two-dimensional inverse Fourier transform kernel. During the forward propagation of the frequency domain transformation convolutional layer, the two-dimensional Fourier transform kernel performs spectral decomposition on the input feature map to generate amplitude spectrum and phase spectrum; The learnable frequency domain filtering parameters are constructed as a ring bandpass filter bank in polar coordinates. The ring bandpass filter bank performs weighted suppression or pass operations on different frequency bands of the amplitude spectrum based on the radial frequency difference between the cracks and pits in the frequency domain. The processed amplitude spectrum is merged with the original phase spectrum and then input into the two-dimensional inverse Fourier transform kernel to restore the spatial domain features, thereby realizing directional frequency domain selection and feature enhancement of road damage at different physical scales.
10. The intelligent road recognition system based on deep learning according to claim 6, characterized in that, The speed limit control signal generated by the roadside traffic control signal controller, which includes a multi-level transition speed gradient, is sent to the roadside communication unit via a V2X direct communication interface, and is further broadcast by the roadside communication unit to the connected vehicles in the form of a roadside coordination message. After parsing the roadside coordination message, the on-board terminal of the connected vehicle extracts the multi-level transition speed gradient and the corresponding spatial position coordinates. Combined with the current longitudinal acceleration and braking distance of the connected vehicle, it plans the optimal deceleration curve that follows the multi-level transition speed gradient. When the braking distance required for the calculation of the optimal deceleration curve exceeds the remaining distance from the current location of the connected vehicle to the abnormal road section, the on-board terminal triggers an emergency braking takeover strategy to compensate for the safety loophole of the driver's delayed response to multi-level transition speed gradients.