A quadruped robot road surface recognition model dynamic sample confrontation test method
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- TIANFU JIANGXI LAB
- Filing Date
- 2026-04-16
- Publication Date
- 2026-06-19
Smart Images

Figure CN122244869A_ABST
Abstract
Description
Technical Field
[0001] This application belongs to the field of quadruped robot environmental perception technology, and in particular relates to a dynamic sample adversarial testing method for a quadruped robot road surface recognition model. Background Technology
[0002] Quadruped robots, with their superior mobility and adaptability, have been widely used in complex unstructured environments, demonstrating great potential, especially in disaster relief, environmental inspection, and military reconnaissance. In these applications, road surface recognition, as a core function of the quadruped robot's motion control system, directly affects the robot's motion accuracy, safety, and stability under different road surface conditions. Road surface recognition not only determines whether the robot can accurately switch between different movement modes such as walking and obstacle crossing, but also plays a crucial role in the robot's adaptability in complex environments. Specifically, road surface type recognition (such as flat roads, stairs, grass, gravel, etc.) helps the robot make effective decisions, thereby maintaining a stable driving state and avoiding tipping over or stalling in difficult terrain.
[0003] With the development of deep learning technology, road surface recognition methods based on convolutional neural networks (CNNs) have become mainstream, especially object detection technologies represented by YOLO (You Only Look Once), which have significant advantages in real-time performance and accuracy. The YOLO model, through an end-to-end deep learning framework, can quickly extract image features and classify road surface types from static images. However, the performance of the YOLO model in practical applications is often limited by the limitations of the training samples, especially in dynamic environments, where its performance significantly degrades. Various dynamic disturbances encountered by quadruped robots during movement (such as gait changes, ground bumps, lighting changes, and occlusions) prevent static training samples from fully simulating real-world scenarios, leading to a significant decrease in the model's accuracy in real-time applications. Static test sets cannot reproduce the dynamic disturbances during movement, resulting in two core problems: 1. The conflict between image testing and motion blur: Current mainstream road surface recognition solutions mostly use the YOLO series models, which identify road surface types by extracting and classifying features from static road surface images. However, robot dogs produce significant motion blur during movement, and static images are difficult to simulate real-world environmental interference such as motion blur, perspective changes, and sudden changes in lighting. Cause: The robot dog's rapid movement, body shaking, and impacts from irregular terrain (such as going up and down stairs or on gravel) cause relative motion between the camera and the scene at the moment of exposure, resulting in image blurring and texture distortion.
[0004] Impact: In practical applications, quadruped robots are in a dynamic state of motion. The image acquisition devices they carry (such as cameras from the perspective of a robot dog) will generate dynamic image sequences (i.e., dynamic video streams) due to factors such as robot gait and bumps. The difference between static training samples and dynamic actual scenes can easily lead to a decrease in the recognition accuracy of the YOLO model. At the same time, interference factors such as changes in lighting, occlusion, and noise in the external environment further exacerbate the instability of model recognition. Existing technologies lack sample generation and adversarial testing mechanisms for dynamic scenes, making it difficult to meet the requirements for the accuracy of quadruped robots in recognizing the road surface in complex dynamic environments.
[0005] Solution shortcomings: Current mainstream solutions lack dedicated designs for motion blur. Preprocessing typically employs general deblurring algorithms (such as non-blind deconvolution), but this may over-amplify noise or produce artifacts, further damaging semantic information. The models themselves often use fixed convolutional kernels, making it difficult to adaptively handle blur differences in different regions of the image (static background and moving targets), leading to a decrease in overall recognition robustness. Although some studies have attempted to enhance training by generating blurred data using GANs, real-world motion blur patterns are complex (such as non-uniform blur and light scattering), and synthetic data cannot perfectly simulate them, resulting in insufficient model generalization ability.
[0006] 2. Challenges of Temporal Continuity Loss and Cross-Frame Consistency As a continuous sensing task, road surface recognition requires consistent performance over time. Existing solutions are mostly based on independent analysis of single-frame images, lacking effective utilization of temporal correlations. Limitations of single-frame attacks: If adversarial attacks or environmental interference are optimized only for single-frame images, their effectiveness may quickly disappear in the video stream due to inter-frame differences, failing to maintain a lasting misleading effect.
[0007] Existing technical solutions include, for example, road surface recognition based on sensor fusion. This method combines data from sensors such as images, LiDAR, and inertial measurement units (IMU) for road surface recognition. Sensor fusion methods utilize deep learning models to fuse data from different sensors, thereby enhancing the model's environmental perception capabilities, especially in poor lighting conditions or when visual information is insufficient, enabling reliable road surface recognition. However, this method has high hardware costs, requiring the integration of multiple sensors (such as cameras, LiDAR, IMU, etc.), increasing the complexity and cost of the equipment; it also has high processing complexity, as sensor fusion needs to handle various data formats and sources, requiring significant computational resources and additional algorithms to handle the synchronization and fusion of sensor data; and it is highly dependent on sensor data, with recognition capabilities significantly decreasing in the event of sensor failure or data loss.
[0008] Existing technical solutions include, for example, road surface recognition based on traditional computer vision methods. Traditional road surface recognition methods typically rely on manual feature extraction and classic machine learning algorithms, such as Support Vector Machines (SVM) or Random Forests. These methods classify road surface types by manually extracting features from images, such as edge detection, texture analysis, and color distribution. However, the feature extraction in these methods depends on manual design and cannot automatically learn the features best suited for road surface recognition, resulting in limited generalization ability of the model; low computational efficiency, especially in real-time applications, where traditional methods cannot handle large-scale datasets and complex dynamic scenes; and a lack of robustness, performing poorly in complex environments, especially in dynamically changing scenarios (such as lighting, occlusion, etc.).
[0009] Existing technical solutions include, for example, the YOLO model based on deep learning. YOLO (You Only Look Once) is a widely used object detection technique for road surface recognition. YOLO divides an image into multiple grids through a single forward propagation, with each grid predicting a bounding box and class probability, thus achieving real-time road surface type recognition. This method relies on deep convolutional neural networks (CNNs), which can automatically extract deep features from the image, thereby improving recognition accuracy. However, this method performs poorly in dynamic environments, especially during quadruped robot movement, where the model may be affected by dynamic factors such as gait changes, lighting changes, and occlusion, leading to a significant decrease in recognition accuracy. Furthermore, it lacks adaptability; YOLO models typically rely on static image training and lack the ability to adapt to dynamic scenes (such as bumps generated during robot movement and rapidly changing perspectives).
[0010] The YOLO model exhibits poor adaptability to dynamic environments. Static image test sets cannot reproduce real-world disturbances (such as motion blur, viewpoint changes, and sudden changes in lighting) during robot dog motion. This leads to a sharp drop in recognition accuracy during actual deployment due to dynamic environmental interference, particularly a surge in misclassification rates for complex terrains such as potholes, slopes, and slippery surfaces. The main reasons are: 1. The YOLO model's training relies on static image datasets, lacking adaptability to dynamic environments. 2. Image jolting, viewpoint changes, and varying ambient lighting conditions during robot movement create significant differences between static images and dynamic environments.
[0011] Furthermore, the YOLO model lacks cross-domain adaptability. YOLO models are typically trained on a specific dataset (source domain), and their performance drops significantly when applied to new environments (target domain), especially under conditions of varying lighting and terrain. The main reasons are: 1. The training data for YOLO models usually comes from a fixed environment or scene, resulting in poor generalization ability to new environments. 2. The lack of an effective cross-domain training mechanism prevents the model from sharing features between the source and target domains.
[0012] Furthermore, dynamic samples are scarce in YOLO model training data. Static images cannot simulate environmental changes encountered by the robot during movement, such as ground vibrations, lighting changes, obstructions, and road undulations. Therefore, existing datasets are insufficient to provide comprehensive dynamic training data. The main reasons are: 1. Generating dynamic samples is complex and computationally intensive, and generating samples that conform to dynamic environments from static images is a challenge. 2. The interference factors in dynamic environments are diverse and difficult to predict, making it difficult for training data to cover all possible dynamic changes. Summary of the Invention
[0013] The purpose of this application is to overcome the problems of the prior art by disclosing a dynamic sample adversarial testing method for a quadruped robot road surface recognition model. By combining DANN with YOLO, the method enables adversarial testing and training of the YOLO model using dynamically generated adversarial samples.
[0014] The objective of this application is achieved through the following technical solution: A method for dynamic sample adversarial testing of a quadruped robot road surface recognition model, comprising the following steps: S1: Road surface image data acquisition and preprocessing, including: image acquisition and annotation, data preprocessing, and image format conversion; S2: Dynamic road surface adversarial sample generation based on DANN, including: constructing a dynamic generation network DANN, generating an adversarial interference database, and training the DANN; S3: Dynamic sample adversarial testing based on YOLO and DANN, including: YOLO model initialization training, DANN dynamic sample generation, YOLO model dynamic adversarial training, and road surface recognition and motion state switching.
[0015] According to a preferred embodiment, step S1 includes: S11: Image acquisition and annotation. Images are acquired through a camera from the perspective of the quadruped robot, and the acquired images are annotated according to the perspective of the quadruped robot. S12: Data preprocessing. First, an image cropping algorithm is used to crop the acquired original image from the robot dog's perspective, ensuring that the image size is 640×640 pixels. Then, bilinear interpolation is used to scale the cropped image to 224×224 pixels to meet the input requirements of the YOLO model. After the size adjustment is completed, the RGB channels of the image are normalized. S13: Image format conversion. The normalized image is saved in PIL format, and then converted to PyTorch tensor format using PyTorch conversion functions to ensure that the data can be input into the YOLO model for training.
[0016] According to a preferred embodiment, in step S11, the acquired images cover four types of road surfaces, including: flat roads, stairs, grass, and gravel, and the acquired images are labeled as the four types of road surfaces.
[0017] According to a preferred embodiment, in step S12, the normalization processing of the three RGB channels of the image is performed using the following expression:
[0018] in, These are the pixel values of the input image. It is the mean of each channel of the image. It is the standard deviation of each channel of the image. Represents the three RGB channels. It is the normalized value.
[0019] According to a preferred embodiment, step S2 includes: S21: Construct a Dynamic Generation Network (DANN), which consists of a generator and a discriminator. The generator adopts an improved U-Net architecture, with the input being a static road surface image tensor and dynamic motion parameters, and the output being a sequence of continuous dynamic road surface images, i.e., a dynamic video stream. The discriminator adopts a CNN architecture, with the input being a sequence of dynamic road surface images and the output being a discrimination result, in order to distinguish between real dynamic samples and generated dynamic samples. S22: Based on the kinematic simulation of quadruped robots, dynamic motion parameters such as gait frequency and sway amplitude under different motion states are obtained, and factors such as sudden changes in illumination, local occlusion, and random noise interference are superimposed to form an anti-interference parameter library; S23: Using a DANN generator, anti-interference parameters are injected into the dynamic video stream in the form of feature modulation to simulate real-world interference and obtain an anti-interference database. S24: The adversarial loss function LSGAN is used to train the dynamic adversarial test samples to generate test samples that are closer to the real dynamic environment, thus completing the DANN training process.
[0020] According to a preferred embodiment, during the DANN training process in step S24, the loss function is calculated as follows:
[0021] in, It is the discriminator's judgment of the real sample. These are samples generated by the generator. It is input random noise. It is the discriminator's judgment of the generated samples.
[0022] According to a preferred embodiment, step S3 includes: S31: Initialization of YOLO road surface recognition model. YOLOv5 is used as the base model for road surface recognition. The initial training of YOLOv5 model uses preprocessed static images as the training set. The training objective is to recognize four types of road surfaces: flat road, stairs, grass, and gravel. S32: Dynamic adversarial test sample construction. Using the dynamic video stream generated by DANN and the interference parameter library, a dynamic adversarial test sample set including jitter, illumination change and occlusion factors is generated. S33: Dynamic adversarial training of YOLO model. The dynamic adversarial test samples generated in S32 are input into the YOLOv5 model for dynamic adversarial training. During the training process, the network parameters of the YOLO model are adjusted to optimize the model's ability to adapt to adversarial interference. S34: Road surface recognition and motion state switching. The YOLO model, which has been trained in dynamic adversarial mode, is deployed in the quadruped robot to identify the road surface type in the dynamic video stream in real time and control the switching of the robot's motion state based on the recognition results.
[0023] According to a preferred embodiment, the YOLOv5 structure includes an input terminal, a backbone network, a neck, and a head; The input is sized using the Focus module; the backbone network extracts features and enhances them using the CSP and SPP modules; the neck integrates information from various scales; and the head outputs target predictions at different scales, including category and location.
[0024] According to a preferred embodiment, the input image size of YOLOv5 is 224×224, the batch size is 16, the number of training rounds is set to 100, the optimization algorithm is Adam, and the learning rate is 0.001.
[0025] According to a preferred embodiment, in step S34, the movement state switching rules include: in a flat environment: walking gait; in a grass / gravel environment: switching to obstacle-crossing gait at a distance of 50-100cm from the target; in a stair environment: stair gait.
[0026] The aforementioned main solution and its various further alternative solutions can be freely combined to form multiple solutions, all of which are solutions that can be adopted and are claimed in this application. Those skilled in the art, after understanding the solution of this application, will realize that there are many combinations based on the prior art and common general knowledge, all of which are technical solutions to be protected in this application, and will not be exhaustively listed here.
[0027] The beneficial effects of this application are: This invention addresses the issue of low accuracy in dynamic scenes using traditional YOLO models by combining YOLO and DANN. Furthermore, it enhances the model's robustness through adversarial testing with dynamic samples. Ultimately, the adversarially trained YOLO model achieves high-accuracy road surface recognition in complex dynamic scenarios and precisely switches the robot's motion state based on the recognition results, ensuring the safe and stable movement of the quadruped robot and demonstrating significant engineering application value. Attached Figure Description
[0028] Figure 1 This is a functional flowchart of the road surface recognition dynamic sample testing method of this application; Figure 2 This is a schematic diagram of the road surface image data acquisition and preprocessing process of this application; Figure 3 This is a schematic diagram of the dynamic road surface adversarial sample generation process based on DANN in this application; Figure 4 This is a schematic diagram of the DANN adversarial network structure in this application; Figure 5 This is a schematic diagram of the dynamic sample adversarial testing process based on YOLO and DANN in this application; Figure 6 This is a schematic diagram illustrating the transmission and reception process of the quadruped robot road surface recognition in this application; Figure 7 This is a schematic diagram illustrating the sending and receiving process of road surface recognition by a quadruped robot. Detailed Implementation
[0029] The following specific examples illustrate the implementation of this application. Those skilled in the art can easily understand other advantages and effects of this application from the content disclosed in this specification. This application can also be implemented or applied through other different specific embodiments, and various details in this specification can also be modified or changed based on different viewpoints and applications without departing from the spirit of this application. It should be noted that, unless otherwise specified, the following embodiments and features in the embodiments can be combined with each other.
[0030] It should be noted that similar labels and letters in the following figures indicate similar items. Therefore, once an item is defined in one figure, it does not need to be further defined and explained in subsequent figures.
[0031] In the description of this application, it should be noted that the terms "center," "upper," "lower," "left," "right," "vertical," "horizontal," "inner," and "outer," etc., indicate the orientation or positional relationship based on the orientation or positional relationship shown in the accompanying drawings, or the orientation or positional relationship commonly used when the product of this application is in use. They are only for the convenience of describing this application and simplifying the description, and do not indicate or imply that the device or element referred to must have a specific orientation, or be constructed and operated in a specific orientation, and therefore should not be construed as a limitation on this application. In addition, the terms "first," "second," and "third," etc., are only used to distinguish descriptions and should not be construed as indicating or implying relative importance.
[0032] Furthermore, terms such as "horizontal," "vertical," and "sag" do not imply that components must be absolutely horizontal or suspended, but rather that they can be slightly tilted. For example, "horizontal" simply means that its direction is more horizontal relative to "vertical," and does not mean that the structure must be completely horizontal, but can be slightly tilted.
[0033] In the description of this application, it should also be noted that, unless otherwise expressly specified and limited, the terms "set up," "install," "connect," and "link" should be interpreted broadly. For example, they can refer to a fixed connection, a detachable connection, or an integral connection; they can refer to a mechanical connection or an electrical connection; they can refer to a direct connection or an indirect connection through an intermediate medium; and they can refer to the internal connection of two components. Those skilled in the art can understand the specific meaning of the above terms in this application based on the specific circumstances.
[0034] Furthermore, it should be noted that unless otherwise specified in this application, the specific structures, connections, positions, power sources, etc. involved are all things that a person skilled in the art can know without creative effort based on the prior art.
[0035] refer to Figures 1 to 6 As shown, this application discloses a dynamic sample adversarial testing method for a quadruped robot road surface recognition model. This application combines DANN (Dynamic Adversarial Neural Network) with YOLO (You Only Look Once), which can realize adversarial testing and training of the YOLO model through adversarial generated dynamic samples.
[0036] Specifically, the DANN generator can generate diverse dynamic samples by simulating environmental disturbances such as gait, bumps, and lighting changes under different movement states of a quadruped robot. The YOLO model can then be trained on these dynamic samples, thereby enhancing its adaptability to various environmental changes in dynamic scenes. Through adversarial training, the YOLO model can maintain high recognition accuracy and robustness when facing complex dynamic environments (such as sudden changes in lighting, occlusion, noise, and other disturbances). The aim is to improve the robustness of quadruped robots in recognizing different road surface types (flat roads, stairs, grass, gravel) in complex dynamic scenes, ensuring that they accurately switch between walking, stair climbing, or obstacle crossing states in actual operations.
[0037] Preferably, such as Figure 1 As shown, the dynamic sample adversarial testing method for the quadruped robot road surface recognition model includes the following steps.
[0038] Step S1: Road surface image data acquisition and preprocessing, including: image acquisition and annotation, data preprocessing, and image format conversion; Step S2: Dynamic road surface adversarial example generation based on DANN, including: constructing a dynamic generation network DANN, generating an adversarial interference database, and training the DANN; Step S3: Dynamic sample adversarial testing based on YOLO and DANN, including: YOLO model initialization training, DANN dynamic sample generation, YOLO model dynamic adversarial training, and road surface recognition and motion state switching.
[0039] Specifically, refer to Figure 2 As shown, step S1 includes the following steps.
[0040] Step S11: Image acquisition and annotation. Images are acquired through a camera from the perspective of the quadruped robot, and the acquired images are annotated according to the quadruped robot's perspective.
[0041] Preferably, in step S11, the acquired images cover four types of road surfaces in multiple environments and scenes, including: flat roads, stairs, grass, and gravel, with at least 2000 images for each type of road surface. The acquired images are labeled as the four road surface types: flat roads, stairs, grass, and gravel.
[0042] Step S12: Data preprocessing. First, an image cropping algorithm is used to crop the acquired original image from the perspective of the robot dog, ensuring that the image size is 640×640 pixels. Then, the cropped image is scaled to 224×224 pixels using a bilinear interpolation algorithm to meet the input requirements of the YOLO model. After the size adjustment is completed, the RGB channels of the image are normalized to eliminate the influence caused by differences in pixel values.
[0043] The normalization of the RGB channels of the image is performed using the following expression:
[0044] in, These are the pixel values of the input image. It is the mean of each channel of the image. It is the standard deviation of each channel of the image. Represents the three RGB channels. It is the normalized value.
[0045] Step S13: Image format conversion. The normalized image is saved in PIL format, and then converted to PyTorch tensor format using PyTorch conversion functions to ensure that the data can be input into the YOLO model for training.
[0046] Specifically, refer to Figure 3 As shown, step S2 includes the following steps.
[0047] Step S21: Construct a Dynamic Generation Network (DANN). The DANN consists of a generator and a discriminator. The generator adopts an improved U-Net architecture, with the input being a static road surface image tensor and dynamic motion parameters, and the output being a sequence of continuous dynamic road surface images, i.e., a dynamic video stream. The discriminator adopts a CNN architecture, with the input being the sequence of dynamic road surface images and the output being the discrimination result, in order to distinguish between real dynamic samples and generated dynamic samples.
[0048] DANN network structure as follows Figure 4The diagram shows the network architecture of DANN (Generative Adversarial Network). This architecture consists of two parts: source input and target input, and is mainly composed of the following modules: 1. Backbone: This is the core of the network, processing both source and target inputs through a shared Backbone module. This module is responsible for extracting the basic features of the input data. 2. Bottleneck: After the Backbone, the data flow enters the Bottleneck layer, further compressing and refining important features to provide necessary feature support for subsequent output and domain discrimination. 3. Output: The data flow after processing by the Bottleneck layer enters the output layer. The output layer generates the model's final prediction result, containing classification or regression information for the target domain data. 4. Gradient Reversal Layer: This layer is used for adversarial training. By reversing the gradient, it prevents the discriminator from distinguishing between source and target domain data during training. The role of the gradient reversal layer is to promote feature sharing, ensuring that the features learned by the network are effective for both the source and target domains. 5. DomainLabel: The final discriminator classifies the features processed by the gradient inversion layer to determine whether the input data belongs to the source or target domain. This module trains the network to share features between the source and target domains by calculating the losses for both. 6. True Label: The true label is used to calculate the classification or regression loss, ensuring the model learns accurate task-relevant features. This network ensures feature sharing between the source and target domains through adversarial training, optimizing the network's performance in different environments and improving its adaptability to cross-domain tasks.
[0049] Step S22: Based on the kinematic simulation of the quadruped robot, obtain dynamic motion parameters such as gait frequency (0.5-2Hz) and jolt amplitude (0-5cm) under different motion states, and superimpose factors such as sudden changes in illumination, local occlusion, and random noise interference to form an anti-interference parameter library.
[0050] Step S23: Using a DANN generator, anti-interference parameters are injected into the dynamic video stream via feature modulation to simulate real-world interference and obtain an anti-interference database. The strength of the anti-interference is controlled by the interference coefficient (0-0.2).
[0051] Step S24: Use the adversarial loss function LSGAN to train the dynamic adversarial test samples to generate test samples that are closer to the real dynamic environment, thus completing the DANN training process.
[0052] Preferably, in the DANN training process of step S24, the loss function is calculated as follows:
[0053] in, It is the discriminator's judgment of the real sample. These are samples generated by the generator. It is input random noise. It is the discriminator's judgment of the generated samples.
[0054] This application employs an adversarial training method combining the generator and discriminator of a DANN, using a gradient inversion mechanism to ensure training stability, and reducing computational overhead by optimizing the network structure and parameter tuning. Furthermore, the adversarial training of the DANN ensures high-quality generated samples, while the adversarial loss function (such as LSGAN loss) optimizes the training process, ensuring efficient training without overfitting, thereby improving the model's performance in dynamic environments.
[0055] Specifically, refer to Figure 5 As shown, step S3 includes the following steps.
[0056] Step S31: Initialize the YOLO road surface recognition model. Use YOLOv5 as the base model for road surface recognition. The initial training of the YOLOv5 model uses preprocessed static images as the training set. The training objective is to recognize four types of road surfaces: flat road, stairs, grass, and gravel.
[0057] Preferably, the input image size of YOLOv5 is 224×224, the batch size is 16, the number of training rounds is set to 100, the optimization algorithm is Adam, and the learning rate is 0.001.
[0058] Preferably, refer to Figure 6 As shown, the structure of YOLOv5 includes an input terminal, a backbone network, a neck, and a head; wherein, the input terminal adjusts its size through a Focus module; the backbone network extracts features and enhances the features using CSP and SPP modules; the neck fuses information at various scales; and the head outputs target predictions at different scales, including category and location.
[0059] In the diagram, the YOLOv5 network structure takes an image of size (224×224×3) as input. First, the image is processed by the Focus module to initially adjust the feature map size. Then, features are extracted and enhanced through the CBL module (composed of convolution, batch normalization, and activation functions), and CSP (Cross-Stage Local Network) modules such as CSP1_3 and CSP2_3. The SPP (Spatial Pyramid Pooling) structure can also fuse multi-scale features, improving adaptability to targets of different sizes. Next, a feature pyramid structure combined with upsampling is used, and a Concat operation is used to fuse shallow fine-grained features with deep semantic features. Finally, after processing by multiple CSP modules and convolutional (conv) layers, prediction results at three different scales (7×7×255), (14×14×255), and (28×28×255) are output, used to detect targets of different sizes. Each scale output includes information such as target category and location.
[0060] Step S32: Dynamic adversarial test sample construction. Using the dynamic video stream generated by DANN and the interference parameter library, a dynamic adversarial test sample set including jitter, illumination changes and occlusion factors is generated.
[0061] Step S33: Dynamic adversarial training of the YOLO model. Input the dynamic adversarial test samples generated in S32 into the YOLOv5 model for dynamic adversarial training. During the training process, the network parameters of the YOLO model are adjusted to optimize the model's ability to adapt to adversarial interference, ensuring that it can still perform road recognition efficiently and accurately in dynamic interference environments.
[0062] By combining adversarial training with YOLO and DANN, a gradient inversion layer is used to achieve feature sharing between the source and target domains. This enables the model to adapt from the source to the target domain, reducing feature differences between the two domains. Through adversarial training, the model learns to extract domain-independent features, enhancing its cross-domain recognition ability and improving the accuracy of target domain recognition.
[0063] Step S34: Road surface recognition and motion state switching. The YOLO model trained in dynamic adversarial mode is deployed into the quadruped robot to identify the road surface type in the dynamic video stream in real time and control the switching of the robot's motion state based on the recognition results.
[0064] In step S34, the movement state switching rules include: in a flat environment: walking gait; in a grass / gravel environment: switching to obstacle-crossing gait when 50-100cm away from the target; in a stair environment: stair gait.
[0065] This invention addresses the issue of low accuracy in dynamic scenes using traditional YOLO models by combining YOLO and DANN. Furthermore, it enhances the model's robustness through adversarial testing with dynamic samples. Ultimately, the adversarially trained YOLO model achieves high-accuracy road surface recognition in complex dynamic scenarios and precisely switches the robot's motion state based on the recognition results, ensuring the safe and stable movement of the quadruped robot and demonstrating significant engineering application value.
[0066] Example This application presents a dynamic sample adversarial testing method for a quadruped robot road surface recognition model. Combining dynamic sample generation and adversarial testing techniques, it improves the YOLO model's ability to recognize different types of road surfaces in dynamic environments encountered by quadruped robots during locomotion. Quadruped robot road surface recognition includes two main stages: a road surface recognition transmission stage and a road surface recognition reception stage. For example... Figure 7 As shown.
[0067] (1) Road surface recognition and transmission stage Input data preprocessing: In this stage, real-time image data from the quadruped robot's camera is first input into the preprocessing module. The data undergoes operations such as image rotation, cropping, and adjustment to ensure that the images meet the requirements for subsequent feature extraction. The image data is then standardized, including resizing and noise reduction, to ensure the consistency and quality of the input data.
[0068] YOLO Model Training: The preprocessed image is input into the YOLO model for initial training, yielding preliminary recognition results. The YOLO model is responsible for extracting features from the image and performing preliminary road surface type classification.
[0069] DANN Adversarial Training: After the initial training of the YOLO model, the images are fed into the DANN adversarial network. The DANN network generates dynamic samples (such as different lighting changes, occlusions, and other disturbances) to train the YOLO model adversarially. This stage aims to improve the robustness of the YOLO model in dynamic environments, enabling it to better cope with disturbances from different source domains (such as road surfaces in different scenes) and target domains (such as road surfaces under different lighting conditions).
[0070] (2) Road surface recognition and reception stage YOLO Model Adversarial Testing: In the receiving phase, the YOLO model receives image input from a real-time camera on a quadruped robot. After the first phase of adversarial training, the YOLO model is able to effectively identify road surface types in the received dynamic image data. Adversarial test samples are further optimized through adversarial training, making the model more adaptable to different environmental disturbances.
[0071] Output: The YOLO model outputs the recognition results and controls the quadruped robot's movement state according to different road surface types (flat road, stairs, grass, gravel). If a flat road is detected, the robot maintains a normal walking gait; if a grass or gravel surface is detected, it switches to an obstacle-crossing gait when it is 50-100cm away from the surface; if stairs are detected, the robot switches to a gait mode adapted to stairs.
[0072] Gait switching: Based on the road surface recognition results, the system can flexibly adjust the gait of the quadruped robot to achieve smooth transitions and switching, ensuring stability and safety in complex dynamic environments.
[0073] The above description is merely a preferred embodiment of this application and is not intended to limit this application. Any modifications, equivalent substitutions, and improvements made within the spirit and principles of this application should be included within the protection scope of this application.
Claims
1. A dynamic sample adversarial testing method for a quadruped robot road surface recognition model, characterized in that, The dynamic sample adversarial testing method for the quadruped robot road surface recognition model includes the following steps: S1: Road surface image data acquisition and preprocessing, including: image acquisition and annotation, data preprocessing, and image format conversion; S2: Dynamic road surface adversarial sample generation based on DANN, including: constructing a dynamic generation network DANN, generating an adversarial interference database, and training the DANN; S3: Dynamic sample adversarial testing based on YOLO and DANN, including: YOLO model initialization training, DANN dynamic sample generation, YOLO model dynamic adversarial training, and road surface recognition and motion state switching.
2. The dynamic sample adversarial testing method for a quadruped robot road surface recognition model as described in claim 1, characterized in that, Step S1 includes: S11: Image acquisition and annotation. Images are acquired through a camera from the perspective of the quadruped robot, and the acquired images are annotated according to the perspective of the quadruped robot. S12: Data preprocessing. First, an image cropping algorithm is used to crop the acquired original image from the robot dog's perspective, ensuring that the image size is 640×640 pixels. Then, bilinear interpolation is used to scale the cropped image to 224×224 pixels to meet the input requirements of the YOLO model. After the size adjustment is completed, the RGB channels of the image are normalized. S13: Image format conversion. The normalized image is saved in PIL format, and then converted to PyTorch tensor format using PyTorch conversion functions to ensure that the data can be input into the YOLO model for training.
3. The dynamic sample adversarial testing method for a quadruped robot road surface recognition model as described in claim 2, characterized in that, In step S11, the collected images cover four types of road surfaces: flat roads, stairs, grass, and gravel, and the collected images are labeled as the four types of road surfaces.
4. The dynamic sample adversarial testing method for a quadruped robot road surface recognition model as described in claim 2, characterized in that, In step S12, the normalization of the three RGB channels of the image is performed using the following expression: in, These are the pixel values of the input image. It is the mean of each channel of the image. It is the standard deviation of each channel of the image. Represents the three RGB channels. It is the normalized value.
5. The dynamic sample adversarial testing method for a quadruped robot road surface recognition model as described in claim 2, characterized in that, Step S2 includes: S21: Construct a Dynamic Generation Network (DANN), which consists of a generator and a discriminator. The generator adopts an improved U-Net architecture, with the input being a static road surface image tensor and dynamic motion parameters, and the output being a sequence of continuous dynamic road surface images, i.e., a dynamic video stream. The discriminator adopts a CNN architecture, with the input being a sequence of dynamic road surface images and the output being a discrimination result, in order to distinguish between real dynamic samples and generated dynamic samples. S22: Based on the kinematic simulation of quadruped robots, dynamic motion parameters such as gait frequency and sway amplitude under different motion states are obtained, and factors such as sudden changes in illumination, local occlusion, and random noise interference are superimposed to form an anti-interference parameter library; S23: Using a DANN generator, anti-interference parameters are injected into the dynamic video stream in the form of feature modulation to simulate real-world interference and obtain an anti-interference database. S24: The adversarial loss function LSGAN is used to train the dynamic adversarial test samples to generate test samples that are closer to the real dynamic environment, thus completing the DANN training process.
6. The dynamic sample adversarial testing method for a quadruped robot road surface recognition model as described in claim 5, characterized in that, During the DANN training process in step S24, the loss function is calculated as follows: in, It is the discriminator's judgment of the real sample. These are samples generated by the generator. It is input random noise. It is the discriminator's judgment of the generated samples.
7. The dynamic sample adversarial testing method for a quadruped robot road surface recognition model as described in claim 6, characterized in that, Step S3 includes: S31: Initialization of YOLO road surface recognition model. YOLOv5 is used as the base model for road surface recognition. The initial training of YOLOv5 model uses preprocessed static images as the training set. The training objective is to recognize four types of road surfaces: flat road, stairs, grass, and gravel. S32: Dynamic adversarial test sample construction. Using the dynamic video stream generated by DANN and the interference parameter library, a dynamic adversarial test sample set including jitter, illumination change and occlusion factors is generated. S33: Dynamic adversarial training of YOLO model. The dynamic adversarial test samples generated in S32 are input into the YOLOv5 model for dynamic adversarial training. During the training process, the network parameters of the YOLO model are adjusted to optimize the model's ability to adapt to adversarial interference. S34: Road surface recognition and motion state switching. The YOLO model, which has been trained in dynamic adversarial mode, is deployed in the quadruped robot to identify the road surface type in the dynamic video stream in real time and control the switching of the robot's motion state based on the recognition results.
8. The dynamic sample adversarial testing method for a quadruped robot road surface recognition model as described in claim 7, characterized in that, The structure of YOLOv5 includes an input terminal, a backbone network, a neck, and a head. The input is sized using the Focus module; the backbone network extracts features and enhances them using the CSP and SPP modules; the neck integrates information from various scales; and the head outputs target predictions at different scales, including category and location.
9. The dynamic sample adversarial testing method for a quadruped robot road surface recognition model as described in claim 7, characterized in that, The YOLOv5 input image size is 224×224, the batch size is 16, the number of training rounds is set to 100, the optimization algorithm is Adam, and the learning rate is 0.
001.
10. The dynamic sample adversarial testing method for a quadruped robot road surface recognition model as described in claim 7, characterized in that, In step S34, the motion state switching rules include: In flat terrain: walking gait; in grass / gravel terrain: switch to obstacle-crossing gait when 50-100cm away from the target; in staircase terrain: staircase gait.