Road condition detection method and device, storage medium and program product

By combining the overall analysis model and the local analysis model, road images are segmented and local and overall features are fused, which solves the problem of insufficient adaptability of traditional methods in complex environments and achieves more accurate road condition detection.

CN122244816APending Publication Date: 2026-06-19BEIJING XIAOSHI TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
BEIJING XIAOSHI TECH CO LTD
Filing Date
2026-03-25
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Traditional road condition detection methods are not adaptable to complex environments and are unable to fully capture local and overall features, resulting in poor detection results.

Method used

By combining a global analysis model and a local analysis model, the road image is segmented into sub-images, and local and global features are extracted respectively. The first fused feature is generated through feature fusion and then a classification model is used for detection.

🎯Benefits of technology

It improves the accuracy and reliability of road condition detection, enhances adaptability to complex environments, and provides more accurate road anomaly detection results.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122244816A_ABST
    Figure CN122244816A_ABST
Patent Text Reader

Abstract

This disclosure provides a road condition detection method, apparatus, storage medium, and program product. The method includes: extracting overall features of a road image using a holistic analysis model; segmenting the road image into multiple sub-images; extracting local features of the multiple sub-images using a local analysis model; obtaining a first fusion feature of the road image based on the local features and the overall features; and detecting the road based on a classification model and the first fusion feature. This method can provide accurate and comprehensive road conditions by comprehensively considering both local and overall features of the road image, helping the classification model accurately detect abnormal road conditions, and is applicable to various complex road environments.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates to the field of image processing technology, and in particular to a road condition detection method, apparatus, storage medium, and program product. Background Technology

[0002] In traditional road condition detection methods, rule-based approaches are poorly adapted to complex environments and struggle to cope with changing road conditions. This is because rule-based methods typically rely on manually designed rule sets, which often fail to cover all possible road anomalies, resulting in insufficient adaptability. Although deep learning methods improve the ability to handle complex situations by automatically learning features, these methods often focus only on local or global features of the image, thus limiting their representational capabilities. Summary of the Invention

[0003] In view of this, the present disclosure provides a road condition detection method, apparatus, storage medium, and program product, which can provide accurate and comprehensive road conditions by comprehensively considering the local or overall features of road images, and help classification models accurately detect abnormal road conditions.

[0004] In a first aspect, embodiments of this disclosure provide a road condition detection method, employing the following technical solution: A holistic analysis model is used to extract the overall features of the road images; The road image is segmented into multiple sub-images; Local features of multiple sub-images are extracted using a local analysis model; Based on the local features and the global features, a first fusion feature of the road image is obtained; The road is detected based on the classification model and the first fused feature.

[0005] Optionally, obtaining the first fusion feature of the road image based on the local features and the global features includes: The local features and the overall features are concatenated and fused according to the first preset feature dimension direction to obtain the first fused feature; and / or, The local features and the overall features are fused based on the constructed first feature fusion layer to obtain the first fused feature; and / or, The effective features of the road image are extracted using the constructed first feature extraction model; The effective feature, the local feature, and the overall feature are fused to obtain the first fused feature.

[0006] Optionally, the road detection based on the classification model and the first fused feature includes: Based on the constructed second feature extraction model, deep features are extracted from the first fused features; The dimensions of the deep features are adjusted. The dimensionally adjusted deep features are fused with the first fusion feature to obtain the second fusion feature; Based on the constructed feature optimization model, feature learning and feature representation are performed on the second fused feature to obtain the third fused feature; The classification model is used to analyze the third fusion feature to detect road conditions.

[0007] Optionally, fusing the dimension-adjusted deep features with the first fusion feature to obtain the second fusion feature includes: The deep-level features, adjusted according to the second preset feature dimension direction, are concatenated and fused with the first fusion feature to obtain the second fusion feature; and / or, The constructed third feature fusion layer fuses the dimension-adjusted deep features with the first fusion feature to obtain the second fusion feature; and / or, The weight coefficients of the dimension-adjusted deep features and the first fusion feature are determined respectively. Based on the weight coefficients, the dimension-adjusted deep features and the first fusion feature are weighted and concatenated to obtain the second fusion feature.

[0008] Optionally, fusing the dimension-adjusted deep features with the first fusion feature further includes: During the fusion process of the dimension-adjusted deep features and the first fusion features, a non-linear activation function is used for non-linear feature mapping; The process of optimizing the nonlinear feature mapping using mechanisms such as attention.

[0009] Optionally, the feature optimization model adopts an encoder-decoder structure; The encoder obtains the feature map by performing multi-layer nonlinear mapping on the second fused feature; The decoder restores the dimension of the feature map to the dimension of the second fused feature by progressive upsampling.

[0010] Optionally, the local features include at least one of potholes, cracks, water stains, oil stains, road material, road surface color, and road surface texture; The overall features include at least one of the following: road structure, traffic signs, environmental elements, scene type, and image style; Wherein, the road structure refers to the geometric and cross-sectional shape of the road; the traffic signs refer to signs and markings used to indicate traffic rules, warn of road conditions, and guide the direction of vehicle travel; the environmental elements refer to factors used to provide semantic information about the area surrounding the road and traffic participants on the road; the scene type refers to the context of the road's location; and the image style refers to the visual attributes of the road image.

[0011] Secondly, this disclosure also provides a road condition detection system, which adopts the following technical solution: The first extraction module is used to extract the overall features of the road image using a holistic analysis model; The image segmentation module is used to segment the road image into multiple sub-images; The second extraction module is used to extract local features of multiple sub-images using a local analysis model; The feature acquisition module is used to obtain a first fusion feature of the road image based on the local features and the global features; The road detection module is used to detect roads based on the classification model and the first fused features.

[0012] Thirdly, this disclosure also provides a computer device, which adopts the following technical solution: The computer device includes: At least one processor; and, A memory communicatively connected to the at least one processor; wherein, The memory stores instructions that can be executed by the at least one processor to enable the at least one processor to perform any of the road condition detection methods described above.

[0013] Fourthly, embodiments of this disclosure also provide a computer-readable storage medium storing computer instructions for causing a computer to perform any of the road condition detection methods described above.

[0014] Fifthly, embodiments of this disclosure also provide a computer program product, including a computer program / instructions that, when executed by a processor, implement the steps of any of the methods described above.

[0015] The road condition detection method provided in this disclosure segmentes a road image into multiple sub-images and uses a local analysis model to extract local features, thereby capturing local details and features of the road. It also uses a global analysis model to extract global features, thereby capturing global information of the road image. The first fusion feature generated based on the local and global features comprehensively considers both local and global information. The first fusion feature, which combines the advantages of both, more accurately and comprehensively describes the road condition, making it easier for the classification model to analyze the road condition, enhancing the effect of road anomaly detection, making the detection results more reliable, and improving adaptability to complex road environments.

[0016] The above description is merely an overview of the technical solution disclosed herein. In order to better understand the technical means of this disclosure and to implement it in accordance with the contents of the specification, and to make the above and other objects, features and advantages of this disclosure more apparent and understandable, preferred embodiments are described below in detail with reference to the accompanying drawings. Attached Figure Description

[0017] To more clearly illustrate the technical solutions of the embodiments of this disclosure, the drawings used in the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this disclosure. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0018] Figure 1 This is a schematic flowchart of the road condition detection method provided in the embodiments of this disclosure; Figure 2 A schematic diagram illustrating the feature optimization process provided for embodiments of this disclosure; Figure 3 This is a schematic block diagram of a road condition detection system provided in an embodiment of the present disclosure; Figure 4 This is a schematic diagram of the structure of a computer device provided in an embodiment of the present disclosure. Detailed Implementation

[0019] The embodiments of this disclosure will now be described in detail with reference to the accompanying drawings.

[0020] It should be understood that the following specific examples illustrate the implementation of this disclosure, and those skilled in the art can easily understand other advantages and effects of this disclosure from the content disclosed in this specification. Obviously, the described embodiments are only a part of the embodiments of this disclosure, and not all of them. This disclosure can also be implemented or applied through other different specific implementation methods, and the details in this specification can also be modified or changed based on different viewpoints and applications without departing from the spirit of this disclosure. It should be noted that, in the absence of conflict, the following embodiments and features in the embodiments can be combined with each other. Based on the embodiments in this disclosure, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this disclosure.

[0021] It should be noted that various aspects of embodiments within the scope of the appended claims are described below. It will be apparent that the aspects described herein can be embodied in a wide variety of forms, and any particular structure and / or function described herein is merely illustrative. Based on this disclosure, those skilled in the art will understand that one aspect described herein can be implemented independently of any other aspect, and two or more of these aspects can be combined in various ways. For example, any number of aspects set forth herein can be used to implement the device and / or practice the method. Additionally, this device and / or method can be implemented using structures and / or functionalities other than one or more of the aspects set forth herein.

[0022] It should also be noted that the illustrations provided in the following embodiments are only schematic representations of the basic concept of this disclosure. The drawings only show the components related to this disclosure and are not drawn according to the number, shape and size of the components in actual implementation. In actual implementation, the form, quantity and proportion of each component can be arbitrarily changed, and the layout of the components may also be more complex.

[0023] Furthermore, specific details are provided in the following description to facilitate a thorough understanding of the examples. However, those skilled in the art will understand that the described aspects can be practiced without these specific details.

[0024] Reference Figure 1 This disclosure provides a road condition detection method, including the following steps: S1: Use a holistic analysis model to extract the overall features of the road image.

[0025] Remote sensing images of the target area taken by satellite are selected as road images. The road images are then standardized, including size adjustment and normalization, to adapt to subsequent model processing.

[0026] The overall features include at least one of the following: road structure, traffic signs, environmental elements, scene type, and image style. Road structure refers to the geometric and cross-sectional shape of the road, including road surface smoothness, longitudinal and transverse slopes, road width, and curvature. Traffic signs are signs and markings used to indicate traffic rules, warn of road conditions, and guide vehicle direction, including lane lines, intersection signs, turning signs, and speed limit signs. Environmental elements provide semantic information about the surrounding area and road users, including buildings, trees, pedestrians, and vehicles. Scene type refers to the context of the road's location, including urban centers, suburbs, and highways; the scene type can be identified based on the characteristics of the surrounding environment. Image style refers to the visual attributes of the road image, including overall image color, contrast, and lighting characteristics. Overall image color describes the overall tone and color distribution of the road image, reflecting lighting conditions and environmental characteristics. Contrast and lighting characteristics reflect the brightness and light conditions of the road image, helping to understand the visual quality and scene features of the road image.

[0027] S2: Divide the road image into multiple sub-images.

[0028] S3: Use a local analysis model to extract local features from multiple sub-images.

[0029] The local features include at least one of the following: pothole features, crack features, water stain features, oil stain features, road material, pavement color, and pavement texture. Pothole features include the shape, size, depth, and wear marks of the pothole; crack features include the shape, extent, and texture of the crack; road materials include asphalt pavement, cement pavement, gravel pavement, dirt pavement, composite pavement, brick pavement, sand and gravel pavement, and lawn pavement; water stain features include the shape, size, color, and density of the water stain; and oil stain features include the shape, size, color, and density of the oil stain.

[0030] S4: Based on local and global features, obtain the first fusion feature of the road image.

[0031] S5: Detect roads based on the classification model and the first fusion feature.

[0032] Traditional deep learning methods improve the ability to handle complex situations by automatically learning features, but these methods often focus only on one of the local or global features of an image, thus limiting their representational capabilities. For example, some methods may focus on local details, such as the shape and texture features of potholes or cracks, while ignoring the impact of the overall road layout on road anomalies; conversely, other methods may focus more on the features of the overall road structure but may fail to capture important information from local details.

[0033] The road condition detection method disclosed herein, by segmenting a road image into multiple sub-images and employing a local analysis model to extract local features, can capture local details and features of the road, including potholes and cracks, thus enabling more refined detection. Employing a holistic analysis model to extract global features captures global information about the road image, including road structure and environmental elements, thereby providing a more comprehensive understanding of the road condition. Then, the first fusion feature generated based on the local and global features comprehensively considers both local and global information, more accurately describing the road condition, facilitating classification model analysis, and enhancing anomaly detection performance.

[0034] In summary, by combining strategies of local feature extraction, global feature extraction, and feature fusion, and employing a classification model for analysis, road anomalies can be effectively detected, improving detection accuracy and reliability. Furthermore, the complementary effects of various models are provided, further enhancing detection accuracy and robustness, making the detection results more reliable, and improving adaptability to complex road environments.

[0035] In S1, the overall analysis model is trained by a fully convolutional network (FCN). Of course, a suitable FCN variant can be selected according to the characteristics of the road image, such as FCN-32s, FCN-16s or FCN-8s. These FCN variants differ in the feature sampling steps, which affects the spatial resolution of the final output.

[0036] For fully convolutional networks (LCNs), their convolutional and activation layers encode features from the input road image, extracting fundamental features such as texture and color. Instead of pooling, LCNs maintain spatial continuity through successive convolutional operations, which is crucial for understanding the overall layout of the road image. By progressively upsampling through transposed convolutions to restore the feature map size to match the original input road image size, and by utilizing skip connections to combine features from different layers, this helps recover road image details and improve edge prediction. Depthwise separable convolutions are employed to reduce computational complexity while preserving important features, making the model more efficient. By applying convolutional kernels of different sizes at different layers, multi-scale feature learning of the road image is achieved.

[0037] Pixel-level cross-entropy loss or focal loss is chosen as a suitable loss function, and Adam or SGD is selected as an efficient optimizer to guide the training of the fully convolutional network (CCNN). Techniques such as Dropout and batch normalization are applied to reduce overfitting in the CCNN, while hyperparameters are adjusted to achieve optimal performance. The feature extraction performance of the CCNN is evaluated using metrics such as accuracy and IOU (Intersection over Union). If the evaluation results are unsatisfactory, the CCNN is fine-tuned to optimize it.

[0038] Using the methods described above, fully convolutional networks can effectively extract overall features from road images while preserving key spatial information. This is crucial for understanding the state and dynamic changes of the entire road network and provides a comprehensive and accurate foundation for subsequent feature fusion and anomaly detection.

[0039] In S2, a sliding window method is used to segment the road image into multiple sub-images. The size of the sub-images is controlled by adjusting the window size and step size to ensure that each sub-image covers a small portion of the road as much as possible.

[0040] In S3, the local analysis model is trained by a deep convolutional neural network, which can be a VGG (Visual Geometry Group Network) or a ResNet (Residual Network) model, etc.

[0041] In S4, this disclosure provides three possible implementation methods to obtain the first fusion feature. The first possible implementation method is a splicing and fusion based on a first preset feature dimension direction, as follows: Local and global features are represented by feature vectors, which are typically one-dimensional arrays (vectors) or multi-dimensional arrays (matrices). The feature dimension direction includes row and column directions. One direction is pre-selected as the first preset feature dimension direction. The local and global features are then concatenated along this first preset feature dimension direction to achieve the fusion of the two features, resulting in the first fused feature. For example, if the row direction is selected, and both the local and global features are one-dimensional arrays, then the local and global features are concatenated end-to-end to form a longer feature vector.

[0042] The second possible implementation method is feature fusion based on the first feature fusion layer, as detailed below: If the dimensions of both local and global features are less than a preset dimension threshold, a fully connected layer is selected as the first feature fusion layer. If the dimensions of either local or global features are not less than a preset dimension threshold, a convolutional layer is selected as the first feature fusion layer.

[0043] The first feature fusion layer can learn to tightly fuse local and global features from different sources into a new feature representation, forming the first fused feature, thereby improving the model's ability to represent and generalize input data.

[0044] The third possible implementation method is feature fusion based on the first feature extraction model, as detailed below: The first feature extraction model is trained from ResNeXt (Residual Next), an efficient convolutional neural network architecture that improves the scalability and efficiency of the network by reusing the same module structure, making it suitable for handling complex image recognition tasks.

[0045] Effective features are extracted from the road image using a first feature extraction model. These effective features include both fine-grained and global features of the road image. The effective features, local features, and global features are then fused to obtain a first fused feature. This disclosure provides three methods for fusing these three features. The first method is a weighted concatenation fusion, as detailed below: The weight coefficients of effective features, local features and global features are determined by methods such as grid search. Based on the weight coefficients, the effective features, local features and global features are weighted and concatenated to complete the fusion of the three and form the first fused feature.

[0046] The second method is chain-based splicing and fusion, as follows: The effective features, local features, and global features are chained and fused according to a preset splicing order. Specifically, in this disclosure, the local features and global features are first spliced ​​together to generate a long vector. Then, the long vector is further spliced ​​with the effective features to form a first fused feature with a hierarchical structure. Of course, the splicing order of these three components can be adjusted according to different needs and tasks to achieve the best feature fusion effect.

[0047] The third method is feature fusion based on the second feature fusion layer, as detailed below: If the dimensions of the effective features, local features, and global features are all less than the preset dimension threshold, then a fully connected layer is selected as the second feature fusion layer. If any of the dimensions of the effective features, local features, and global features are not less than the preset dimension threshold, then a convolutional layer is selected as the second feature fusion layer.

[0048] The second feature fusion layer can learn to tightly fuse effective features, local features, and global features from different sources into a new feature representation, forming the first fused feature, thereby improving the model's ability to represent and generalize input data.

[0049] Reference Figure 2The flowchart illustrating feature optimization shows that in step S5, the obtained first fused feature is optimized as follows: S21: Based on the constructed second feature extraction model, deep features are extracted from the first fused features; S22: Adjust the dimensions of deep-level features; S23: Fuse the dimension-adjusted deep features with the first fusion feature to obtain the second fusion feature; S24: Based on the constructed feature optimization model, perform feature learning and feature representation on the second fusion feature to obtain the third fusion feature; S25: A classification model is used to analyze the third fusion feature to detect road conditions.

[0050] In S21, a second feature extraction model is constructed to extract deep features from the first fused features. This second feature extraction model is also trained using ResNeXt. Several later modules of ResNeXt can extract deep features; for example, the last module can extract highly abstract global semantic features, including the overall scene representation, and the penultimate module can extract complex local structural features, including the shape of cracks or pits. ResNeXt, through its grouped convolutions, can capture complex feature patterns while maintaining computational efficiency, effectively handling diverse features. Therefore, utilizing ResNeXt's deep network structure for depth optimization is highly advantageous for the first fused features.

[0051] In S22, appropriate network layers (such as convolutional layers and pooling layers) are selected as dimension adjustment layers to adjust the dimensions of deep features to make them suitable for subsequent classification models. Specifically, 1x1 convolutional layers in ResNeXt can be used to reduce the dimensionality of deep features, or global average pooling layers can be used to reduce the width and height of the feature map, thereby obtaining a fixed-length feature vector to effectively adjust the dimensions of deep features.

[0052] In S23, the dimension-adjusted deep features are fused with the first fused feature to obtain the second fused feature. Referring to the steps in S4 above for obtaining the first fused feature, the second fused feature can be obtained by concatenation fusion based on the second preset feature dimension direction, where the direction selected in advance in the row and column directions is used as the second preset feature dimension direction; feature fusion based on a third feature fusion layer can be used, where if the dimensions of both the dimension-adjusted deep features and the first fused feature are less than a preset dimension threshold, a fully connected layer is selected as the third feature fusion layer; if either the dimension of the dimension-adjusted deep features or the first fused feature is not less than the preset dimension threshold, a convolutional layer is selected as the third feature fusion layer; weight-based concatenation fusion can be used, where the weight coefficients of the dimension-adjusted deep features and the first fused feature are determined using methods such as grid search, and the dimension-adjusted deep features and the first fused feature are weighted and concatenated based on these weight coefficients to complete the fusion and form the second fused feature; chain-like concatenation fusion can also be used.

[0053] During the fusion process of the deep features after dimensionality adjustment and the first fusion features, non-linear activation functions (such as ReLU or LeakyReLU) can be used to perform non-linear feature mapping to enhance the model's ability to learn complex feature relationships. Attention mechanisms can also be used to further optimize the non-linear feature mapping process, enabling the model to focus on the most important feature parts for road anomaly detection, thereby improving the model's performance and robustness, especially in tasks such as road anomaly detection.

[0054] The attention mechanism can utilize an internal attention module, embedded within the intermediate layers of ResNeXt, to learn the correlation weights between features. This is achieved by inputting the feature map into a convolutional layer to obtain a weight map, and then performing a dot product operation between these weights and the input feature map to generate a new feature map. This process emphasizes the importance of relevant features, enabling the model to more accurately capture the correlation information between features.

[0055] An attention mechanism can also utilize an external attention module. This external attention mechanism first uses a branch network to learn "saliency" attention weight mappings relevant to anomaly detection. This branch network can employ fully connected layers or convolutional neural networks to capture important anomaly-related information in the feature maps. Then, these "saliency" attention weight mappings are multiplied by the feature maps extracted by a second feature extraction model, thereby emphasizing anomaly-related features. This process allows the model to more accurately focus on anomaly-related features, thus improving the performance and robustness of anomaly detection.

[0056] The attention mechanism can also employ a multi-head attention module. Multi-head attention introduces a more complex mechanism, utilizing different attention heads to learn feature weight mappings under different semantics. Each attention head focuses on capturing specific feature relationships, and combining the outputs of all heads yields a comprehensive attention feature. This multi-head attention mechanism allows the model to more comprehensively understand and utilize the semantic relationships between features.

[0057] The attention mechanism can also employ reinforced attention training, which uses reinforcement learning algorithms to enable the attention module to autonomously explore and learn the optimal attention strategy. In this process, the module gradually adjusts and optimizes its attention weights through interaction with the environment, such as feedback from anomaly detection tasks, to better focus on key features. This adaptive learning approach allows the model to effectively capture key information when faced with different data and tasks.

[0058] Optionally, the second fusion feature can also be validated and adjusted. During the validation phase, the effectiveness of the optimized second fusion feature is verified through experiments, such as cross-validation on a road anomaly detection task. This process helps determine which features are most effective for the task and also evaluates the model's generalization ability across different datasets. Based on the validation results, the optimization strategy for the second fusion feature is further adjusted. This includes adjusting the number of layers or parameters of the second feature extraction model and optimizing the fusion method between deep features and the first fusion feature.

[0059] By leveraging the powerful capabilities of ResNeXt, deeper and more discriminative features can be extracted from the fused features, providing more accurate and effective input for subsequent classification models. Continuous validation and adjustments ensure the optimal effect of the feature optimization process, resulting in a significant improvement in the overall performance of the road condition detection system.

[0060] In S24, the feature optimization model is trained from a deep neural network, such as a multilayer perceptron (MLP) or a deep convolutional network, to learn and represent the second fused feature, thereby obtaining the third fused feature. An encoder-decoder structure is used as the macroscopic network structure of the feature optimization model. The encoder abstracts the second fused feature through multiple layers of nonlinear mapping, transforming it into a higher-level representation called feature mapping. The encoder contains three convolutional blocks, each containing two convolutional layers, two BatchNorm (Batch Normalization) layers, and two ReLU (Rectified Linear Units). The first convolutional block has 256 input channels and 512 output channels, and the second convolutional block has 512 input channels and 512 output channels. The number of input channels in the first convolutional block is 1024, and the number of output channels in the third convolutional block is 2048. The decoder restores the dimension of the feature map to the dimension of the second fused feature by progressive upsampling. The decoder contains three transposed convolutional blocks, each containing one transposed convolutional layer, one BatchNorm, and one ReLU. The first transposed convolutional block has 2048 input channels and 1024 output channels, the second transposed convolutional block has 1024 input channels and 512 output channels, and the third transposed convolutional block has 512 input channels and 256 output channels. Skip connections are added between the encoder and decoder (specifically, the corresponding symmetrical decoder layers) to preserve spatial information. By adjusting parameters such as the number of channels, the number of convolutional blocks and transposed convolutional blocks, and the skip connection method, the network's ability to model feature relationships can be controlled. This macroscopic network structure not only ensures strong feature extraction and expression capabilities, but also ensures that the feature optimization model can capture and learn the complex relationships between features in the second fusion feature through skip connections to store spatial information. Then, the learned features are represented so that the subsequent classification model can better understand and utilize these features.

[0061] In the process of building a feature optimization model, nonlinear transformations (also known as nonlinear mappings) are performed. These transformations are achieved using nonlinear activation functions (such as ReLU and Sigmoid) in the network layers of a deep neural network, enhancing the model's ability to capture nonlinear relationships between features. The transformed features are then subjected to dimensionality reduction using methods such as autoencoders or PCA to reduce the feature dimensionality while preserving key information. This helps reduce the model's computational complexity and mitigate the curse of dimensionality.

[0062] A feature learning strategy is set to determine how to train a deep neural network to learn the most effective feature representations. If supervised learning is chosen as the feature learning strategy, feature learning is performed within the framework of supervised learning, using labeled data to guide the model to learn more effective feature representations; if unsupervised or semi-supervised learning is chosen as the feature learning strategy, unlabeled data is used to discover potential feature structures and patterns in unsupervised or semi-supervised learning scenarios.

[0063] Regularization techniques (such as Dropout and L1 / L2 regularization) are applied to reduce overfitting in deep neural networks and enhance their generalization ability. Batch normalization is used to stabilize the learning process of deep neural networks, improve their training efficiency, and optimize the parameters of deep neural networks during training using algorithms such as backpropagation and gradient descent. Cross-validation is used to evaluate the performance of deep neural networks, ensuring that the learned feature representations are effective and accurate for road anomaly detection tasks.

[0064] In practical applications of road anomaly detection, the feature representations learned by deep neural networks are applied to road anomaly detection tasks, and their performance in real-world applications is evaluated. Based on the performance and feedback from these applications, the feature representation learning model is adjusted to further improve detection results. The effectiveness of the second feature extraction model can also be evaluated on different road anomaly detection tasks, such as detecting road cracks and identifying obstacles. Based on the evaluation results, the parameters and structure of the second feature extraction model are adjusted to obtain the optimal feature representation.

[0065] In summary, by constructing and optimizing deep neural networks, feature representations can be effectively learned and improved, providing more discriminative feature representations, thereby enhancing the accuracy and reliability of road anomaly detection. Continuous model training, validation, and tuning ensure high detection performance in constantly changing road environments. Applying ResNeXt allows for the effective extraction and optimization of features for road anomaly detection. ResNeXt's powerful expressive capabilities and modular design make it an ideal choice for optimizing and fusing features. This approach not only improves the expressive power of features but also enhances the model's adaptability and discriminative power in complex road conditions.

[0066] The third fusion feature undergoes preprocessing, including normalization and format conversion. A classification model is then used to analyze the preprocessed third fusion feature to detect road anomalies, including potholes, cracks, water accumulation, and obstacles (such as stones). The choice of classification model is crucial. Common choices include Support Vector Machines (SVM), Decision Trees, Random Forests, Convolutional Neural Networks (CNN), and Recurrent Neural Networks (RNN). By comparing the performance of different models on the road anomaly detection task, the model best suited to the current dataset and task characteristics is selected as the classification model.

[0067] Setting up training, validation, and test sets ensures the model is trained and evaluated on different datasets. During training, algorithms such as backpropagation and gradient descent are used to optimize the classification model weights. During evaluation, techniques such as cross-validation are used to adjust the hyperparameters of the classification model, including learning rate, regularization coefficient, number and size of hidden layers. Precision, recall, and F1 score can be used to evaluate the model's performance on the validation and test sets. The model's detection performance under different types of road anomalies is analyzed, identifying its strengths and weaknesses. Error predictions are analyzed to identify patterns and causes of errors. Based on the performance evaluation and error analysis results, the classification model is fine-tuned, such as adjusting the network structure and changing data augmentation strategies.

[0068] Alternatively, model fusion techniques (such as ensemble learning methods) can be used to combine the predictions of multiple different classification models, which can compensate for the shortcomings of a single classification model and thus obtain more robust and accurate results.

[0069] Reference Figure 3 This disclosure provides a road condition detection system, including: The first extraction module 101 is used to extract the overall features of the road image using an overall analysis model; Image segmentation module 102 is used to segment a road image into multiple sub-images; The second extraction module 103 is used to extract local features of multiple sub-images using a local analysis model; The feature acquisition module 104 is used to obtain the first fused features of the road image based on local features and global features; The road detection module 105 is used to detect roads based on a classification model and a first fusion feature.

[0070] The various variations and specific examples of the road condition detection method provided above are also applicable to the road condition detection system provided in this disclosure. Through the foregoing detailed description of the road condition detection method, those skilled in the art can clearly understand the implementation method of the road condition detection system. For the sake of brevity, they will not be described in detail here.

[0071] A computer device according to embodiments of the present disclosure includes a memory and a processor. The memory is used to store non-transitory computer-readable instructions. Specifically, the memory may include one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and / or non-volatile memory. The volatile memory may, for example, include random access memory (RAM) and / or cache memory. The non-volatile memory may, for example, include read-only memory (ROM), hard disk, flash memory, etc.

[0072] The processor may be a central processing unit (CPU) or other form of processing unit with data processing capabilities and / or instruction execution capabilities, and may control other components in the computer device to perform desired functions. In one embodiment of this disclosure, the processor is used to execute computer-readable instructions stored in the memory, causing the computer device to perform all or part of the steps of the road condition detection methods of the foregoing embodiments of this disclosure.

[0073] Those skilled in the art will understand that, in order to solve the technical problem of how to achieve a good user experience, this embodiment may also include well-known structures such as communication buses and interfaces, and these well-known structures should also be included within the protection scope of this disclosure.

[0074] like Figure 4 This is a schematic diagram of a computer device provided for an embodiment of the present disclosure. It illustrates a structural schematic diagram suitable for implementing the computer device in the embodiments of the present disclosure. Figure 4 The computer device shown is merely an example and should not be construed as limiting the functionality and scope of the embodiments disclosed herein.

[0075] like Figure 4 As shown, a computer device may include a processor (such as a central processing unit, graphics processing unit, etc.), which can perform various appropriate actions and processes based on programs stored in read-only memory (ROM) or programs loaded from storage devices into random access memory (RAM). The RAM also stores various programs and data required for the operation of the computer device. The processor, ROM, and RAM are interconnected via a bus. Input / output (I / O) interfaces are also connected to the bus.

[0076] Typically, the following devices can be connected to the I / O interface: input devices, such as sensors or visual information acquisition devices; output devices, such as displays; storage devices, such as magnetic tapes or hard drives; and communication devices. Communication devices allow the computer device to communicate wirelessly or wiredly with other devices (such as edge computing devices) to exchange data. Although Figure 4A computer apparatus with various devices is shown, but it should be understood that it is not required to implement or have all of the devices shown. More or fewer devices may be implemented or included alternatively.

[0077] In particular, according to embodiments of this disclosure, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, embodiments of this disclosure include a computer program product comprising a computer program carried on a non-transitory computer-readable medium, the computer program containing program code for performing the methods shown in the flowcharts. In such embodiments, the computer program can be downloaded and installed from a network via a communication device, or installed from a storage device, or installed from a ROM. When the computer program is executed by a processor, all or part of the steps of the road condition detection method of embodiments of this disclosure are performed.

[0078] For a detailed description of this embodiment, please refer to the corresponding descriptions in the foregoing embodiments, which will not be repeated here.

[0079] A computer-readable storage medium according to embodiments of the present disclosure stores non-transitory computer-readable instructions. When these non-transitory computer-readable instructions are executed by a processor, all or part of the steps of the road condition detection methods of the foregoing embodiments of the present disclosure are performed.

[0080] The aforementioned computer-readable storage media include, but are not limited to: optical storage media (e.g., CD-ROM and DVD), magneto-optical storage media (e.g., MO), magnetic storage media (e.g., magnetic tape or portable hard drive), media with built-in rewritable non-volatile memory (e.g., memory card), and media with built-in ROM (e.g., ROM cartridge).

[0081] For a detailed description of this embodiment, please refer to the corresponding descriptions in the foregoing embodiments, which will not be repeated here.

[0082] The basic principles of this disclosure have been described above with reference to specific embodiments. However, it should be noted that the advantages, benefits, and effects mentioned in this disclosure are merely examples and not limitations, and should not be considered as essential features of each embodiment of this disclosure. Furthermore, the specific details disclosed above are for illustrative and facilitative purposes only, and are not limitations. These details do not limit the scope of this disclosure to the necessity of employing the aforementioned specific details for implementation.

[0083] In this disclosure, relational terms such as "first" and "second" are used merely to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. The block diagrams of devices, apparatuses, devices, and systems involved in this disclosure are merely illustrative examples and are not intended to require or imply that they must be connected, arranged, or configured in the manner shown in the block diagrams. As those skilled in the art will recognize, these devices, apparatuses, devices, and systems can be connected, arranged, and configured in any manner. Words such as "comprising," "including," "having," etc., are open-ended terms meaning "including but not limited to," and are used interchangeably with them. The terms "or" and "and" as used herein refer to the terms "and / or," and are used interchangeably with them unless the context clearly indicates otherwise. The term "such as" as used herein refers to the phrase "such as but not limited to," and is used interchangeably with it.

[0084] Additionally, as used herein, the "or" used in a list of items beginning with "at least one" indicates a separate list, such that a list of, for example, "at least one of A, B, or C" means A or B or C, or AB or AC or BC, or ABC (i.e., A and B and C). Furthermore, the word "exemplary" does not imply that the described example is preferred or better than other examples.

[0085] It should also be noted that in the systems and methods of this disclosure, the components or steps can be decomposed and / or recombined. These decompositions and / or recombinations should be considered as equivalent solutions to this disclosure.

[0086] Various changes, substitutions, and modifications can be made to the technology described herein without departing from the teachings defined by the appended claims. Furthermore, the scope of the claims of this disclosure is not limited to the specific aspects of the processes, machines, manufactures, events, means, methods, and actions described above. Currently existing or later-developed processes, machines, manufactures, events, means, methods, or actions that perform substantially the same function or achieve substantially the same result as the corresponding aspects described herein can be utilized. Therefore, the appended claims include such processes, machines, manufactures, events, means, methods, or actions within their scope.

[0087] The above description of the disclosed aspects is provided to enable any person skilled in the art to make or use this disclosure. Various modifications to these aspects will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to other aspects without departing from the scope of this disclosure. Therefore, this disclosure is not intended to be limited to the aspects shown herein, but rather to be carried out within the widest scope consistent with the principles and novel features disclosed herein.

[0088] The above description has been given for purposes of illustration and description. Furthermore, this description is not intended to limit the embodiments of this disclosure to the forms disclosed herein. Although numerous exemplary aspects and embodiments have been discussed above, those skilled in the art will recognize certain variations, modifications, alterations, additions, and sub-combinations therein.

Claims

1. A road condition detection method, characterized in that, include: A holistic analysis model is used to extract the overall features of the road images; The road image is segmented into multiple sub-images; Local features of multiple sub-images are extracted using a local analysis model; Based on the local features and the global features, a first fusion feature of the road image is obtained; The road is detected based on the classification model and the first fused feature.

2. The road condition detection method according to claim 1, characterized in that, The process of obtaining the first fusion feature of the road image based on the local features and the global features includes: The local features and the overall features are concatenated and fused according to the first preset feature dimension direction to obtain the first fused feature; and / or, The local features and the overall features are fused based on the constructed first feature fusion layer to obtain the first fused feature; and / or, The effective features of the road image are extracted using the constructed first feature extraction model; The effective feature, the local feature, and the overall feature are fused to obtain the first fused feature.

3. The road condition detection method according to claim 1, characterized in that, The method of detecting roads based on the classification model and the first fused feature includes: Based on the constructed second feature extraction model, deep features are extracted from the first fused features; The dimensions of the deep features are adjusted. The dimensionally adjusted deep features are fused with the first fusion feature to obtain the second fusion feature; Based on the constructed feature optimization model, feature learning and feature representation are performed on the second fused feature to obtain the third fused feature; The classification model is used to analyze the third fusion feature to detect road conditions.

4. The road condition detection method according to claim 3, characterized in that, The step of fusing the dimension-adjusted deep features with the first fusion feature to obtain the second fusion feature includes: The deep-level features, adjusted according to the second preset feature dimension direction, are concatenated and fused with the first fusion feature to obtain the second fusion feature; and / or, The constructed third feature fusion layer fuses the dimension-adjusted deep features with the first fusion feature to obtain the second fusion feature; and / or, The weight coefficients of the dimension-adjusted deep features and the first fusion feature are determined respectively. Based on the weight coefficients, the dimension-adjusted deep features and the first fusion feature are weighted and concatenated to obtain the second fusion feature.

5. The road condition detection method according to claim 4, characterized in that, The step of fusing the dimension-adjusted deep features with the first fusion feature further includes: During the fusion process of the dimension-adjusted deep features and the first fusion features, a non-linear activation function is used for non-linear feature mapping; The process of optimizing the nonlinear feature mapping using mechanisms such as attention.

6. The road condition detection method according to claim 3, characterized in that, The feature optimization model adopts an encoder-decoder structure; The encoder obtains the feature map by performing multi-layer nonlinear mapping on the second fused feature; The decoder restores the dimension of the feature map to the dimension of the second fused feature by progressive upsampling.

7. The road condition detection method according to claim 1, characterized in that, The local features include at least one of potholes, cracks, water stains, oil stains, road material, road surface color, and road surface texture; The overall features include at least one of the following: road structure, traffic signs, environmental elements, scene type, and image style; Wherein, the road structure refers to the geometric and cross-sectional shape of the road; the traffic signs refer to signs and markings used to indicate traffic rules, warn of road conditions, and guide the direction of vehicle travel; the environmental elements refer to factors used to provide semantic information about the area surrounding the road and traffic participants on the road; the scene type refers to the context of the road's location; and the image style refers to the visual attributes of the road image.

8. A computer device, characterized in that, The computer device includes: At least one processor; and, A memory communicatively connected to the at least one processor; wherein, The memory stores instructions that can be executed by the at least one processor to enable the at least one processor to perform the road condition detection method according to any one of claims 1-7.

9. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores computer instructions for causing a computer to perform the road condition detection method according to any one of claims 1-7.

10. A computer program product comprising computer instructions, characterized in that, When executed by a processor, the computer instructions implement the steps of the method according to any one of claims 1 to 7.