A perception reliability test method and device, a storage medium and an electronic device

By generating a test case list after the perception module is released and performing automated backfeeding tests on different architecture platforms and comparing them with real labeled data, the problems of long development cycles and inconsistent results of autonomous driving perception modules are solved, and efficient and comprehensive reliability testing is achieved.

CN122309376APending Publication Date: 2026-06-30BEIJING JINGWEI HIRAIN TECH CO INC

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
BEIJING JINGWEI HIRAIN TECH CO INC
Filing Date
2026-04-02
Publication Date
2026-06-30

Smart Images

  • Figure CN122309376A_ABST
    Figure CN122309376A_ABST
Patent Text Reader

Abstract

The application provides a kind of perception reliability test method, device, storage medium and electronic equipment, applied to automatic driving perception test technical field.The application can automatically select perception test cases covering multiple scenarios from data server in combination with development test and historical test requirements after the release of perception module, wherein the perception module includes mutually adapted perception model and perception software, then the perception test cases are input to the perception module running on different architecture platforms, the output results of each platform are obtained using automated backfill test, and are compared with real labeled data to form performance test results, then the performance test results of perception model and software are integrated to obtain perception reliability test conclusion, so that timely and comprehensive reliability verification can be carried out on multiple platforms after the release of perception module, which greatly improves the test efficiency and coverage of perception reliability test.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of autonomous driving perception testing technology, and in particular to a perception reliability testing method, apparatus, storage medium, and electronic device. Background Technology

[0002] Currently, the development and verification cycle of existing autonomous driving perception modules is relatively long, typically including stages such as real-vehicle multi-scenario data collection, data annotation, model training, model and software release, vehicle-side deployment and testing, problem feedback and correction.

[0003] However, due to architectural differences between the model training environment and the vehicle deployment environment, inference results of the model or software may be inconsistent across different platforms, and problems can only be discovered on the vehicle side, which not only affects development efficiency but also wastes testing resources. Furthermore, the collected multi-scenario data has not been fully utilized for comprehensive verification of the new version, resulting in a delay in the discovery of some issues.

[0004] Therefore, how to promptly utilize multi-scenario test data to conduct reliability tests on different platforms after the release of the perception module, and improve the efficiency and comprehensiveness of perception reliability testing, has become a technical problem that urgently needs to be solved by those skilled in the art. Summary of the Invention

[0005] In view of the above problems, the present invention provides a method, apparatus, storage medium, and electronic device for perceptual reliability testing that overcomes or at least partially solves the above problems. The technical solution is as follows:

[0006] A perceived reliability testing method, comprising:

[0007] After the perception module to be tested is released, based on the development and testing requirements and historical testing requirements of the perception module, a test case list is generated, and multiple perception test cases related to the test case list are obtained from the data server. The perception module includes mutually compatible perception models and perception software.

[0008] Each of the aforementioned perception test cases is input into the perception module running on different architecture platforms. Through automated backfeeding tests, the output records of the perception module on each architecture platform are obtained. The results are then compared with the truth records generated by the real labeled data corresponding to the perception test cases to generate perception performance test results.

[0009] Using the perception performance test results of the perception model and the perception software, perception reliability test results are generated.

[0010] Optionally, after the perception module to be tested is released, based on the test case list generated according to the development and testing requirements and historical testing requirements of the perception module, multiple perception test cases related to the test case list are obtained from the data server, including:

[0011] After the perception model to be tested is released, based on the development and testing requirements and historical testing requirements of the perception model, a list of model test cases is generated, and multiple perception test cases related to the list of model test cases are obtained from the data server.

[0012] After the perception software to be tested is released, based on the software test case list generated according to the development and testing requirements and historical testing requirements of the perception software, multiple perception test cases related to the software test case list are obtained from the data server.

[0013] Optionally, after the perception model to be tested is released, based on the model test case list generated according to the development and testing requirements and historical testing requirements of the perception model, multiple perception test cases related to the model test case list are obtained from the data server, including:

[0014] After the perception model to be tested is released, the self-test data of the scenario development of the perception model during the model development process is associated with the corresponding problems or functional requirements to obtain a first model test case list; a second model test case list that is associated with the first model test case list is retrieved from the data server; a third model test case list of historical model versions is obtained; and multiple perception test cases related to the first model test case list, the second model test case list, and the third model test case list are obtained from the data server.

[0015] And / or, after the release of the perception software to be tested, the step of obtaining multiple perception test cases related to the software test case list from the data server based on the development and testing requirements and historical testing requirements of the perception software includes:

[0016] After the perception software to be tested is released, the self-test data of the scenario development during the software development process of the perception software is associated with the corresponding problems or functional requirements to obtain a first software test case list; a second software test case list that is associated with the first software test case list is retrieved from the data server; a third software test case list of historical software versions is obtained; and multiple perception test cases related to the first software test case list, the second software test case list, and the third software test case list are obtained from the data server.

[0017] Optionally, the step of inputting each of the perception test cases into the perception modules running on different architecture platforms, obtaining the output records of the perception modules on each architecture platform through automated backfeed testing, and comparing them with the ground truth records generated from the real labeled data corresponding to the perception test cases to generate perception performance test results includes:

[0018] Multiple first perception test cases related to the model test case list are respectively input into the perception models running on the training architecture platform and the vehicle architecture platform. Through automated backfeeding test, the first output record of the perception model running on the training architecture platform and the second output record of the perception model running on the vehicle architecture platform are obtained respectively. The model algorithm is compared using the first output record, the second output record and the first ground truth record generated by the first real labeled data corresponding to the first perception test case to generate the model perception performance test results.

[0019] Multiple second perception test cases related to the software test case list are input into the perception software running on the training architecture platform and the vehicle architecture platform, respectively. Through automated backfeeding tests, the third output record of the perception software running on the training architecture platform and the fourth output record of the perception software running on the vehicle architecture platform are obtained respectively. The software algorithm is compared using the third output record, the fourth output record and the second truth record generated by the second real labeled data corresponding to the second perception test case, and the software perception performance test results are generated.

[0020] Optionally, the model algorithm comparison process includes:

[0021] For the first output record and the first true value record, and the second output record and the first true value record, respectively, according to the perceived target category, the number of targets and the number of true values ​​for each first perception test case are statistically analyzed and determined to be consistent. A first test case run report for the training architecture platform and a second test case run report for the vehicle architecture platform are generated to evaluate the correctness of the perception results of the perception model on the training architecture platform and the vehicle architecture platform, respectively.

[0022] For one or more inconsistent first perception test cases, the output results of the first perception test cases at each node in the perception process are compared frame by frame in the first output record and the second output record to generate a first architecture comparison report, so as to evaluate the consistency of the perception results of the perception model under different architectures.

[0023] Optionally, the software algorithm comparison process includes:

[0024] For the third output record and the second true value record, and the fourth output record and the second true value record, respectively, according to the perception target category, the number of targets and the number of true values ​​for each second perception test case are statistically analyzed and judged to determine whether they are consistent. A third test case run report for the training architecture platform and a fourth test case run report for the vehicle architecture platform are generated to evaluate the correctness of the perception results of the perception software on the training architecture platform and the vehicle architecture platform, respectively.

[0025] For one or more inconsistent second perception test cases, the output results of the second perception test cases at each node in the perception process are compared frame by frame in the third output record and the fourth output record to generate a second architecture comparison report, so as to evaluate the consistency of the software perception results under different architectures.

[0026] Optionally, the method further includes:

[0027] The test logs generated by different architecture platforms running the perception module during the automated backfeed test are collected, the test logs are parsed, multiple performance indicators related to the perception module are statistically analyzed, and a stability operation report is generated.

[0028] A perception reliability testing device includes: a test case data acquisition unit, a perception performance test comparison unit, and a perception reliability test result generation unit.

[0029] The test case data acquisition unit is used to acquire multiple perception test cases related to the test case list generated by the development and testing requirements and historical testing requirements of the perception module after the perception module is released. The perception module includes mutually compatible perception models and perception software.

[0030] The perception performance test comparison unit is used to input each of the perception test cases into the perception modules running on different architecture platforms. Through automated backfeeding tests, the output records of the perception modules on each architecture platform are obtained respectively, and the algorithm is compared with the truth records generated by the real labeled data corresponding to the perception test cases to generate perception performance test results.

[0031] The perception reliability test result generation unit is used to generate perception reliability test results using the perception performance test results of the perception model and the perception software.

[0032] A computer-readable storage medium having a program stored thereon, which, when executed by a processor, implements the perceived reliability testing method.

[0033] An electronic device includes at least one processor, at least one memory connected to the processor, and a bus; wherein the processor and the memory communicate with each other via the bus; the processor is used to call program instructions in the memory to execute the perceived reliability testing method.

[0034] By employing the above technical solution, this invention provides a perception reliability testing method, apparatus, storage medium, and electronic device. After the perception module to be tested is released, based on a test case list generated according to the development and testing requirements and historical testing requirements of the perception module, multiple perception test cases related to the test case list are obtained from a data server. The perception module includes mutually compatible perception models and perception software. Each perception test case is input into a perception module running on a different architecture platform. Through automated backfeed testing, the output records of the perception module on each architecture platform are obtained, and the results are compared with the truth records generated from the real labeled data corresponding to the perception test cases to generate perception performance test results. Using the perception performance test results of the perception model and perception software, perception reliability test results are generated. This invention generates a test case list by combining development and historical testing requirements, and relies on automated backfeed testing to input perception test cases on different architecture platforms, obtain outputs, and compare them with real labels. This enables timely and comprehensive performance and reliability evaluation of each architecture platform after the perception module is released, effectively improving the efficiency and comprehensiveness of perception reliability testing.

[0035] The above description is merely an overview of the technical solution of the present invention. In order to better understand the technical means of the present invention and to implement it in accordance with the contents of the specification, and in order to make the above and other objects, features and advantages of the present invention more apparent and understandable, specific embodiments of the present invention are described below. Attached Figure Description

[0036] Various other advantages and benefits will become apparent to those skilled in the art upon reading the following detailed description of preferred embodiments. The accompanying drawings are for illustrative purposes only and are not intended to limit the invention. Furthermore, the same reference numerals denote the same parts throughout the drawings. In the drawings:

[0037] Figure 1 A flowchart illustrating one embodiment of the perceived reliability testing method provided by this invention is shown.

[0038] Figure 2 A logic block diagram of the autonomous driving perception reliability testing process provided in an embodiment of the present invention is shown;

[0039] Figure 3A logic block diagram of the release verification process of the perception module provided in an embodiment of the present invention is shown;

[0040] Figure 4 A logic block diagram of the perception performance test result generation process provided in an embodiment of the present invention is shown;

[0041] Figure 5 A schematic diagram of the perceptual reliability testing device provided in an embodiment of the present invention is shown;

[0042] Figure 6 A schematic diagram of the structure of an electronic device provided in an embodiment of the present invention is shown. Detailed Implementation

[0043] Exemplary embodiments of the invention will now be described in more detail with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention may be implemented in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this invention will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.

[0044] With the rapid development of autonomous driving technology, the perception module, as a crucial component of the autonomous driving system, directly impacts the safety and stability of the entire vehicle. Currently, the development and verification process for autonomous driving perception modules typically includes the following stages: First, real-vehicle data is collected and labeled; then, the perception model is trained based on the collected data; after model training, the model and its compatible software are released; next, the model and software are deployed to the vehicle for testing; during vehicle testing, if problems are found, the sensor data at the time of the problem is transmitted back to the R&D team for analysis; finally, the problem is resolved by supplementing training data or modifying the algorithm. This development cycle generally follows an iterative cycle of one or two weeks, progressing cyclically.

[0045] Model training is typically performed in a server environment using a general-purpose computing architecture, such as the x86 architecture. The x86 architecture is a widely used instruction set architecture in general-purpose computers and boasts a mature development ecosystem. In contrast, perception models ultimately need to be deployed on embedded or dedicated automotive hardware platforms, such as ARM (Advanced RISC Machines). The ARM architecture is well-suited for embedded and mobile devices. Due to the differences in hardware architecture between the training and deployment environments, the inference results of the same model often exhibit some errors on both architectures. Furthermore, the sensor data input to the model requires preprocessing, and the inference results also require post-processing; errors may also arise from developing preprocessing and post-processing code on different architectures. Ensuring that a model trained in the training environment can achieve equivalent inference results in the deployment environment becomes a key challenge in the development of perception modules.

[0046] Currently, chip manufacturers typically provide consistency test reports to demonstrate the consistency between the results of their automotive architecture chips and the training environment. However, inconsistencies are frequently encountered during actual vehicle-side verification. Furthermore, new algorithms are usually first implemented and verified in the training environment before being ported to automotive architecture chips for deployment. This leads to delays in the automotive architecture chips' support and evaluation of certain operators, increasing the risk of inter-architecture discrepancies. Solving these problems largely relies on algorithm adjustments, but this requires the ability to detect such discrepancies as early and promptly as possible.

[0047] Furthermore, the perception module's process, from data collection to vehicle-side verification and problem discovery and resolution, is lengthy and time-consuming. If problems can only be discovered during the vehicle-side testing phase, it not only wastes a significant amount of perception development time but also occupies scarce test vehicle resources, affecting the normal execution of other testing tasks. On the other hand, data corresponding to certain problem scenarios is often already obtained in the early collection phase. If this collected multi-scenario data can be fully utilized for verification before version release, problems can be discovered and corrected earlier, reducing the pressure on vehicle-side verification and improving overall development efficiency.

[0048] Based on this, the present invention provides a perception reliability testing method. After the perception module is released, it can automatically select perception test cases covering multiple scenarios from the data server, combining development testing and historical testing requirements. The perception module includes mutually compatible perception models and perception software. The perception test cases are then input into perception modules running on different architecture platforms. Automated backfeedback testing is used to obtain the output results of each platform, which are compared with real labeled data to form performance test results. The performance test results of the perception model and software are then integrated to draw perception reliability test conclusions. This allows for timely and comprehensive reliability verification across multiple platforms after the perception module is released, significantly improving the testing efficiency and coverage of perception reliability testing. Therefore, the present invention can fully utilize existing collected data, cover diverse scenarios, and effectively compare and verify integrated test results of perception models and software algorithms across different platform architectures.

[0049] like Figure 1 The diagram shows a flowchart of one embodiment of the perceived reliability testing method provided by this invention. The method may include:

[0050] S100. After the perception module to be tested is released, based on the test case list generated by the development and testing requirements and historical testing requirements of the perception module, obtain multiple perception test cases related to the test case list from the data server. The perception module includes mutually compatible perception models and perception software.

[0051] The perception module refers to the overall hardware and software system that realizes the environmental perception function. It includes mutually compatible perception models and perception software, which are used to identify and understand the environmental information around the vehicle in applications such as autonomous driving.

[0052] Among them, "release version" refers to the official release of a new version of the perception module.

[0053] Among them, development and testing requirements refer to the expected testing requirements during the development of the perception module, based on the problems to be solved or the functions to be implemented in its new version.

[0054] Historical test requirements refer to test requirements accumulated based on testing experience from previous versions of the perception module. These requirements cover identified issues, key scenarios, and blind spots, ensuring that no new defects are introduced during version upgrades. It's understandable that for the initial version, the corresponding historical test requirements can be empty.

[0055] The test case list refers to a list of perception test cases to be executed, generated based on development testing requirements and historical testing requirements.

[0056] The data server refers to a server used to store and manage perception test cases collected under multiple scenarios and conditions. The data server can receive raw sensor data (images, videos, or point clouds) containing multiple scenarios and corresponding tag information (automatic or manual annotation) collected from actual vehicles or transmitted back from tests. The tag information can include scene descriptions and the time of occurrence.

[0057] Among them, the perception test case refers to a set of specific test data instances used to evaluate the performance of the perception module, including the sensor's raw data and the corresponding real labeling information.

[0058] Among them, the perception model refers to the machine learning or deep learning model used for environmental perception, which is responsible for feature extraction and environmental understanding of the input sensor data.

[0059] Among them, perception software refers to application software used to call perception models, process model outputs, and realize environmental perception functions. It works in conjunction with perception models to complete the overall functions of the perception module.

[0060] Specifically, in this embodiment of the invention, after the perception module completes a new version release, it can automatically analyze the functional points and key test scenarios to be covered based on the development and testing requirements of this version and historical testing requirements, and generate a list of test cases required for this test in conjunction with the testing strategy. Perception test data matching each perception test case in the test case list is retrieved from the data server via an interface. This data covers a variety of environments and scenarios, ensuring the comprehensiveness of the test.

[0061] It's important to note that in real-world projects, the development and release cycles of perception models and perception software often differ, potentially involving separate feature updates and version iterations. Therefore, after the perception module is released, the term "release" can refer to the release of the perception model, the perception software, or a combined release of both. When either the perception model or the perception software is released independently, a corresponding list of test cases needs to be dynamically generated based on the development and testing requirements of its latest version and historical testing requirements. Perception test cases matching these testing requirements should be retrieved from the data server to ensure that regardless of whether the model or software is updated separately, it undergoes thorough testing and verification based on mutual compatibility before being deployed in practical applications.

[0062] S110. Input each perception test case into the perception module running on different architecture platforms. Through automated backfeeding test, obtain the output records of the perception module on each architecture platform. Combine the output records generated by the real labeled data corresponding to the perception test cases with the algorithm comparison to generate perception performance test results.

[0063] The architecture platform refers to the hardware platform used to run the sensing module. Different architecture platforms can include different operating systems, processors, memory, etc., which affect the operating performance and compatibility of the sensing module.

[0064] Optionally, the architecture platform provided in this embodiment of the invention may include a training architecture platform and an in-vehicle architecture platform. The training architecture platform refers to a computing environment used for training the perception module and verifying the algorithm, such as an x86 platform. The in-vehicle architecture platform refers to an in-vehicle hardware platform for deploying and running the perception module, such as an ARM platform.

[0065] Among them, automated backfeeding test refers to the process of testing the sensing module using automated tools, automatically inputting sensing test cases and recording the output.

[0066] The output record refers to the result data generated by the perception module after performing inference on the input perception test case.

[0067] In this context, "real-world labeled data" refers to accurate data that has been manually labeled or verified, used as a reference and benchmark. During testing, real-world labeled data can be compared with the output of the perception module to determine the accuracy and performance of the perception module.

[0068] Among them, the truth record refers to the reference record of the standard value generated based on the real labeled data.

[0069] Algorithm comparison refers to the process of comparing the output records of the perception module with the true value records of the real labeled data using a specified similarity algorithm.

[0070] Among them, the perception performance test results refer to the statistical and analytical results of the perception module's output performance under specific inputs after testing the perception module.

[0071] Optionally, the perceived performance test results may include at least one of the following test reports: a test case run report for evaluating the correctness of the perceived results, a stability run report for evaluating performance metrics, and an architecture comparison report for evaluating the consistency of different architecture platforms.

[0072] Specifically, in this embodiment of the invention, various perception test cases obtained from a data server can be imported and run in perception modules deployed on different hardware architecture platforms. Through an automated testing process, the perception modules are automatically started to execute perception inference, collecting and recording the output results of each perception test case on different architectures. Corresponding truth records are generated using pre-prepared real-label data, and the perception module outputs are automatically compared with the truth values. Quantitative indicators are used to evaluate the performance of the perception modules, thus generating the perception performance test results.

[0073] S120. Using the perception performance test results of the perception model and perception software, generate perception reliability test results.

[0074] Among them, the perception reliability test results are the performance test results of the integrated perception model and perception software on different architecture platforms, which evaluate the stability, accuracy and applicability of the entire perception module, and are used to guide the decision-making for the deployment of the perception model and perception software in real vehicles.

[0075] Specifically, the embodiments of the present invention can summarize and comprehensively analyze the performance test results of perception models and perception software on various architecture platforms, evaluate the stability, accuracy and consistency of the overall perception module in different platform architectures, generate the final perception reliability test report, provide a scientific basis for subsequent deployment of real vehicle versions, and ensure the reliable operation of the perception module in practical applications.

[0076] This invention provides a perception reliability testing method, which includes: after the perception module to be tested is released, obtaining multiple perception test cases related to the test case list generated based on the development and testing requirements and historical testing requirements of the perception module from a data server; wherein the perception module includes mutually compatible perception models and perception software; inputting each perception test case into the perception module running on different architecture platforms; obtaining the output records of the perception module on each architecture platform through automated backfeed testing; comparing the output records generated with the real annotation data corresponding to the perception test cases using an algorithm to generate perception performance test results; and generating perception reliability test results using the perception performance test results of the perception model and perception software. This invention generates a test case list by combining development and historical testing requirements, and relies on automated backfeed testing to input perception test cases on different architecture platforms, obtain outputs, and compare them with real annotations, thereby achieving timely and comprehensive performance and reliability evaluation of each architecture platform after the perception module is released, effectively improving the efficiency and comprehensiveness of perception reliability testing.

[0077] Optionally, in the above Figure 1 Based on one or more corresponding embodiments, in another optional embodiment provided by the present invention, step S100 may specifically include:

[0078] After the perception model to be tested is released, based on the development and testing requirements of the perception model and the historical testing requirements, a list of model test cases is generated, and multiple perception test cases related to the list of model test cases are retrieved from the data server.

[0079] The model test case list refers to a set of test cases based on the development and testing requirements and historical testing requirements of the perception model, which verifies the model's functionality, performance, and stability in different scenarios.

[0080] Optionally, in embodiments of the present invention, after the perception model to be tested is released, the self-test data of the perception model in the scenario development process and the corresponding problems or functional requirements are associated to obtain a first model test case list; a second model test case list that is associated with the first model test case list in the data server is queried; a third model test case list of historical model versions is obtained; and multiple perception test cases related to the first model test case list, the second model test case list and the third model test case list are obtained from the data server.

[0081] During the self-testing phase of the perception model's development, relevant scenario data is used to verify specific problems or new functional requirements. By binding these problems or requirements with the data scenarios used in the self-testing phase, a first model test case list for the current model version can be formed. This first model test case list includes core test scenarios and related perception test cases verified during the development phase, ensuring that testing covers key development priorities. In addition to the perception test cases used in the self-testing phase, this embodiment of the invention can also utilize tag information and scenario classification metadata stored in the data server to automatically query scenario features in the first model test case list. Through similarity matching, tag filtering, or semantic retrieval techniques, more perception test cases related to or similar to the self-testing scenarios are mined from the data server, forming a second model test case list, expanding the test coverage and enriching the test samples. Simultaneously, this embodiment of the invention can extract test case lists from previous model versions; these lists contain the test scenarios and data-related perception test cases verified in previous versions.

[0082] Specifically, in this embodiment of the invention, the first model test case list, the second model test case list, and the third model test case list can be integrated to uniformly initiate a data request to the data server, and download the corresponding test data and annotation results in batches according to the scenarios and data identifiers defined in the perception test cases.

[0083] In this embodiment of the invention, after the perception model is released, a first model test case list is obtained by combining scenario-based R&D self-test data and specific problems or functional requirements during the model development process. Then, a second model test case list related to the scenario is mined through a data server, and a third model test case list from historical versions is integrated to obtain a comprehensive perception test case list. This effectively improves the richness and representativeness of the test cases, thereby ensuring that the test coverage includes both R&D priorities and historical stability, reducing the risk of test omissions, enhancing the scientificity and accuracy of perception model testing, and improving the accuracy of the overall reliability assessment of the perception module.

[0084] After the perception software to be tested is released, based on the software test case list generated according to the development and testing requirements and historical testing requirements of the perception software, multiple perception test cases related to the software test case list are retrieved from the data server.

[0085] The software test case list refers to a set of test cases that verify the software functionality, performance, and stability of the perception software in different scenarios, based on the development and testing requirements and historical testing requirements of the perception software.

[0086] Optionally, in embodiments of the present invention, after the release of the sensing software to be tested, the self-test data of the scenario development during the software development process of the sensing software is associated with the corresponding problems or functional requirements to obtain a first software test case list; a second software test case list that is associated with the first software test case list is queried from the data server; a third software test case list of historical software versions is obtained; and multiple sensing test cases related to the first software test case list, the second software test case list, and the third software test case list are obtained from the data server.

[0087] During the self-testing phase of software development, specific problems or new feature requirements are verified using relevant scenario data. By binding these problems or requirements with the data scenarios used in the self-testing phase, a first software test case list for the current software version can be formed. This first software test case list includes core test scenarios and related perception test cases verified during the development phase, ensuring that testing covers key development priorities. In addition to the perception test cases used in the self-testing phase, this embodiment of the invention can also utilize tag information and scenario classification metadata stored in the data server to automatically query scenario features in the first software test case list. Through similarity matching, tag filtering, or semantic retrieval techniques, more perception test cases related to or similar to the self-testing scenarios are mined from the data server, forming a second software test case list, expanding the test coverage and enriching the test samples. Simultaneously, this embodiment of the invention can extract test case lists from previous software versions; these lists contain perception test cases related to test scenarios and data verified in previous versions.

[0088] Specifically, in this embodiment of the invention, the first software test case list, the second software test case list, and the third software test case list can be integrated to uniformly initiate a data request to the data server, and download the corresponding test data and annotation results in batches according to the scenarios and data identifiers defined in the perception test cases.

[0089] In this embodiment of the invention, after the perception software is released, a first software test case list is obtained by combining scenario-based R&D self-test data and specific problems or functional requirements during the software development process. Then, a second software test case list related to the scenario is mined through a data server, and a third software test case list from historical versions is integrated to obtain a comprehensive perception test case list. This effectively improves the richness and representativeness of the test cases, thereby ensuring that the test coverage includes both R&D priorities and historical stability, reducing the risk of test omissions, enhancing the scientificity and accuracy of perception software testing, and improving the accuracy of the overall reliability assessment of the perception module.

[0090] Because the perception model and perception software differ in their functional focus, update frequency, and testing objectives, the model test case list and the software test case list need to be maintained separately to ensure that the testing coverage of each part is comprehensive and highly targeted.

[0091] Based on the development and testing requirements and historical testing requirements of the perception model and perception software, this invention generates a targeted list of test cases and retrieves the corresponding perception test cases from the data server. This achieves accurate matching and efficient retrieval of test data, ensuring comprehensive and targeted test coverage.

[0092] Optionally, step S110 may include: inputting multiple first perception test cases related to the model test case list into the perception models running on the training architecture platform and the vehicle architecture platform respectively; obtaining the first output record of the perception model running on the training architecture platform and the second output record of the perception model running on the vehicle architecture platform through automated backfeeding test; comparing the model algorithm using the first output record, the second output record and the first ground truth record generated by the first real labeled data corresponding to the first perception test cases; and generating model perception performance test results.

[0093] Specifically, in this embodiment of the invention, the output records of the perception models of the training architecture platform and the vehicle architecture platform obtained during the automated backfeeding test are compared with the ground truth records generated by the corresponding first real labeled data using a preliminary algorithm. The consistency of the perception target categories and their quantities in each test case is statistically analyzed, and test case operation reports are generated at both ends. For the detected differential perception test cases, the output results are further compared frame by frame and multiple nodes to deeply evaluate the consistency of the perception model performance and behavior under different architecture platforms, ensuring the accuracy and stability of cross-platform perception effects.

[0094] The model algorithm comparison process provided in this embodiment of the invention includes: statistically analyzing and determining whether the number of targets and the number of true values ​​are consistent for each first perception test case according to the perception target category, based on the first output record, the first true value record, the second output record, and the first true value record; generating a first test case running report for the training architecture platform and a second test case running report for the vehicle architecture platform to evaluate the correctness of the perception results of the perception model on the training architecture platform and the vehicle architecture platform, respectively; for one or more inconsistent first perception test cases, comparing the output results of the first perception test case at each node in the perception process frame by frame in the first output record and the second output record to generate a first architecture comparison report to evaluate the consistency of the model perception results under different architectures.

[0095] Specifically, in this embodiment of the invention, target information for each category can be extracted from the perception results of the first output record and the first true value record, and the number of targets in each category can be counted. Similarly, the same statistics are performed on the second output record and the first true value record. It is then determined whether the number of targets output by the training architecture platform in each first perception test case matches the number of true values, and whether the number of targets output by the vehicle architecture platform matches the number of true values. Based on the statistical results, a first test case run report for the training architecture platform and a second test case run report for the vehicle architecture platform are generated. The reports detail the matching status of the number of targets in each category and the overall correctness assessment.

[0096] Furthermore, in this embodiment of the invention, test cases that need to be analyzed in detail can be selected from the first perception test cases with inconsistent target quantities according to a preset proportion (e.g., the largest difference in target quantities of 1%).

[0097] For the first and second output records of the selected test cases, a detailed frame-by-frame comparison is performed: frame-by-frame perception output data of the training architecture platform and the vehicle architecture platform for the selected inconsistent test cases, including intermediate outputs of each node in the perception process (e.g., feature extraction, object detection, tracking, etc.). For each node output in each frame, the cosine similarity between the output data of the training architecture platform and the vehicle architecture platform is calculated using the following formula:

[0098] ,

[0099] The average distance, which measures the difference between two vectors, is calculated using the following formula:

[0100] ,

[0101] in, Output vectors for nodes. For vector dimensions.

[0102] The cosine similarity and average distance of all frames and nodes are statistically analyzed, and the maximum, minimum, and average values ​​are calculated. Based on the calculated metrics, a first architecture comparison report is generated, which details the output consistency between the training architecture platform and the vehicle architecture platform at each perception process node, and helps to evaluate the performance differences of the model on different hardware architectures.

[0103] This invention systematically achieves a comprehensive evaluation of the cross-architecture performance and stability of the perception model by comparing model algorithms, from judging the consistency of the overall number of targets to fine-grained frame-by-frame multi-node comparison of key inconsistent cases.

[0104] Optionally, step S110 may include: inputting multiple second perception test cases related to the software test case list into the perception software running on the training architecture platform and the vehicle architecture platform respectively; obtaining the third output record of the perception software running on the training architecture platform and the fourth output record of the perception software running on the vehicle architecture platform through automated backfeeding test; comparing the software algorithm using the third output record, the fourth output record and the second ground truth record generated by the second real annotation data corresponding to the second perception test cases; and generating software perception performance test results.

[0105] Specifically, in this embodiment of the invention, the output records of the perception software of the training architecture platform and the vehicle architecture platform obtained during the automated backfeeding test are compared with the ground truth records generated by the corresponding second real annotation data using a preliminary algorithm. The consistency of the perception target categories and their quantities in each test case is statistically analyzed, and test case operation reports are generated at both ends. For the detected differential perception test cases, the output results are further compared frame by frame and multiple nodes to deeply evaluate the consistency of the perception software performance and behavior under different architecture platforms, ensuring the accuracy and stability of cross-platform perception effects.

[0106] The software algorithm comparison process provided in this embodiment of the invention includes: statistically analyzing the third output record, the second true value record, the fourth output record, and the second true value record according to the perceived target category, and determining whether the number of targets and the number of true values ​​for each second perception test case are consistent; generating a third test case running report for the training architecture platform and a fourth test case running report for the vehicle architecture platform to evaluate the correctness of the perception results of the perception software on the training architecture platform and the vehicle architecture platform, respectively; for one or more inconsistent second perception test cases, comparing the output results of the second perception test cases at each node in the perception process frame by frame in the third output record and the fourth output record to generate a second architecture comparison report to evaluate the consistency of the software perception results under different architectures.

[0107] For details on the software algorithm comparison process, please refer to the description of the model algorithm comparison process section, which will not be repeated here.

[0108] This invention, through software algorithm comparison, systematically achieves a comprehensive evaluation of the cross-architecture performance and stability of the perception software, from judging the consistency of the overall number of targets to fine-grained frame-by-frame multi-node comparison of key inconsistent cases.

[0109] Optionally, in the above Figure 1 Based on one or more corresponding embodiments, in another optional embodiment provided by the present invention, the method may further include:

[0110] The system collects test logs generated by different architecture platforms running the sensing module during automated backfeed testing, parses the test logs, statistically analyzes multiple performance indicators related to the sensing module, and generates a stability operation report.

[0111] Specifically, in the automated backfeedback testing process, the present invention can collect system performance logs generated by the perception module (perception model or perception software) during operation on both the training architecture platform and the vehicle architecture platform. These logs include metrics such as CPU (Central Processing Unit) utilization, GPU (Graphics Processing Unit) utilization, memory usage (MEM), and I / O (Input / Output) latency, as well as system health logs, collected via Linux low-level tools (such as the `top` command). By parsing and statistically analyzing these test log files, the trends of key performance indicators related to the perception module over time are extracted. The focus is on checking whether resource usage is within the expected range, identifying potential memory leaks or abnormal resource usage, and finally compiling a stability operation report. This provides a comprehensive quantitative assessment and assurance of the long-term operational stability and resource usage of the perception module across multiple platforms.

[0112] Furthermore, in this embodiment of the invention, during the automated backfeeding test, test logs are continuously collected on different architecture platforms where the perception module runs. These logs record in detail the operating status and performance data of the perception module. By continuously backfeeding the same or different data within a certain time range (such as 24 hours, 48 ​​hours, 72 hours, one week, or one month), stability tests are conducted to detect whether the perception software can run stably for an extended period. The collected system logs include key performance indicators such as GPU utilization, CPU usage, memory usage, and the percentage of CPU time spent waiting for I / O. Subsequently, these log data are parsed and statistically analyzed to extract multiple performance indicators related to the perception module, analyze the stability of resource consumption, and determine whether preset performance requirements are met. Finally, a stability operation report is generated based on the statistical results.

[0113] By statistically analyzing the performance metrics of the sensing module running on various architecture platforms, this invention can comprehensively monitor resource usage and operational status, promptly identify potential memory leaks or performance bottlenecks, effectively improve the ability to control the cross-architecture stability and resource utilization of the sensing module, ensure the long-term reliability and efficiency of the sensing module in actual operating environments, and significantly enhance the scientific validity and application value of the sensing reliability test results.

[0114] To facilitate understanding of the overall process of autonomous driving perception reliability testing, this section combines... Figure 2 For example: Figure 2 The diagram shows the logical block diagram of the autonomous driving perception reliability testing process provided in this embodiment of the invention. First, the actual vehicle collects sensor data for various scenarios according to project requirements and uploads the data to the data server via network or external hard drive. During data collection, data acquisition personnel record tag information, including scenario content and occurrence time, when specific scenarios occur, so that perception developers can find the corresponding data through time filtering. The data uploaded to the data server is labeled and fed back, forming labeled data that serves as input for model training. Subsequently, the model training and release verification chain includes model training, model release, and automated testing after model release. Automated testing can be referred to in steps S100 to S120. The perception software release chain is independent of the model release. Although both constitute the complete output version of the perception module, the release cycles are often inconsistent. Software developers need to download sensor data from the data server that reveals problems during actual vehicle testing, locate and resolve problems through data feedback (which may involve model release), submit code, compile and release the version, and then conduct more scenario tests on the released software. Related content can also be referred to in steps S100 to S120. The real-vehicle version deployment involves installing the perception software and matching model in the vehicle. Finally, the real-vehicle testing and related data feedback phase ensures that the fully validated version has resolved existing issues, enabling vehicle-side retesting and thus providing a more stable and reliable model and software version guarantee for real-vehicle testing.

[0115] Figure 3The diagram shows the logical flow of the perception module release verification process provided in this embodiment of the invention. While the testing processes triggered during model or software releases are essentially the same, their respective test case lists need to be maintained separately. When a new version is released, relevant test issues or corresponding requirements must be closed. Perception engineers have already completed self-testing based on corresponding scenario data. After associating the development test data with the test issues or requirements, the required test case list for the new version can be determined. In addition to the data used during development, more similar scenario data and their annotation results can be mined using tag information from the data server to generate ground truth values ​​as a benchmark for test result comparison. During new version verification, test cases related to new features or problem resolution are merged with test cases from previous versions to form a complete test case list. During version testing, corresponding data is downloaded according to the test case list, and data backfeed verification is performed on devices where the new version model or software is deployed to obtain test results. Meanwhile, the test can be extended to stability testing. By continuously feeding back data within a certain time range, the stable operation of the sensing software is checked, and system logs are collected to collect statistics on indicators such as GPU and CPU utilization, memory utilization, and the percentage of time CPU waits for I / O, so as to quantitatively evaluate the stability and compliance of system resource consumption.

[0116] Figure 4 The diagram shows the logical block diagram of the perception performance test result generation process provided in this embodiment of the invention. The perception performance test results may include a test case run report, a stability run report, and an X86 vs. ARM architecture comparison report. The test case run result report takes the record output by the test case (i.e., the record file of the perception results) and the truth record in the test case as input, and generates a similarity result report through algorithm comparison. The stability test report uses the top command or system health logs printed by the underlying software to statistically analyze the changes in GPU, CPU, memory, and I / O wait resources related to the perception module over time, mainly assessing whether the resource utilization is within the expected range and whether there are problems such as memory leaks after long-term testing. The X86 vs. ARM architecture comparison report generates a consistency analysis report of test results between architectures based on the record files of the run results from both ends through consistency comparison.

[0117] Although the operations are described in a specific order, this should not be construed as requiring these operations to be performed in the specific order shown or in a sequential order. In certain environments, multitasking and parallel processing may be advantageous.

[0118] It should be understood that the various steps described in the method embodiments of the present invention may be performed in different orders and / or in parallel. Furthermore, the method embodiments may include additional steps and / or omit the steps shown. The scope of the present invention is not limited in this respect.

[0119] Corresponding to the above method embodiments, this invention also provides a perception reliability testing device, the structure of which is as follows: Figure 5 As shown, it may include: a test case data acquisition unit 10, a perception performance test comparison unit 20, and a perception reliability test result generation unit 30.

[0120] The test case data acquisition unit 10 is used to acquire multiple perception test cases related to the test case list generated based on the development and testing requirements and historical testing requirements of the perception module after the perception module is released. The perception module includes mutually compatible perception models and perception software.

[0121] The perception performance test comparison unit 20 is used to input each perception test case into the perception module running on different architecture platforms. Through automated backfeeding test, the output records of the perception module on each architecture platform are obtained, and the algorithm is compared with the truth records generated by the real labeled data corresponding to the perception test cases to generate perception performance test results.

[0122] The perception reliability test result generation unit 30 is used to generate perception reliability test results using the perception performance test results of the perception model and perception software.

[0123] Optionally, the test case data acquisition unit 10 may include: a model test case data acquisition subunit and a software test case data acquisition subunit.

[0124] The model test case data acquisition subunit is used to retrieve multiple perception test cases related to the model test case list generated based on the development and testing requirements and historical testing requirements of the perception model after the perception model is released.

[0125] The software test case data acquisition subunit is used to retrieve multiple perception test cases related to the software test case list generated based on the development and testing requirements and historical testing requirements of the perception software after the release of the perception software.

[0126] Optionally, the model test case data acquisition subunit can be specifically used to obtain a first model test case list by associating the scene development self-test data of the perception model during the model development process with corresponding problems or functional requirements after the perception model to be tested is released; querying the data server to obtain a second model test case list that is related to the first model test case list in terms of scenarios; obtaining a third model test case list of historical model versions; and obtaining multiple perception test cases related to the first model test case list, the second model test case list, and the third model test case list from the data server.

[0127] Optionally, the software test case data acquisition subunit can be specifically used to, after the release of the perception software to be tested, associate the scenario development self-test data of the perception software during the software development process with the corresponding problems or functional requirements to obtain a first software test case list; query the data server to find a second software test case list that is associated with the first software test case list in terms of scenarios; obtain a third software test case list of historical software versions; and obtain multiple perception test cases related to the first software test case list, the second software test case list, and the third software test case list from the data server.

[0128] Optionally, the perception performance test comparison unit 20 includes: a model perception performance test result generation subunit and a software perception performance test result generation subunit.

[0129] The model perception performance test result generation subunit is used to input multiple first perception test cases related to the model test case list into the perception models running on the training architecture platform and the vehicle architecture platform, respectively. Through automated backfeeding tests, the first output record of the perception model running on the training architecture platform and the second output record of the perception model running on the vehicle architecture platform are obtained respectively. The model algorithm is compared using the first output record, the second output record and the first ground truth record generated by the first real labeled data corresponding to the first perception test cases to generate the model perception performance test results.

[0130] The software perception performance test result generation subunit is used to input multiple second perception test cases related to the software test case list into the perception software running on the training architecture platform and the vehicle architecture platform, respectively. Through automated backfeeding test, the third output record of the perception software running on the training architecture platform and the fourth output record of the perception software running on the vehicle architecture platform are obtained respectively. The software algorithm is compared using the third output record, the fourth output record and the second ground truth record generated by the second real annotation data corresponding to the second perception test cases to generate the software perception performance test result.

[0131] Optionally, the model perception performance test result generation subunit can be specifically used to execute the model algorithm comparison process: for the first output record and the first true value record, and the second output record and the first true value record, respectively, according to the perception target category, it is statistically determined whether the number of targets and the number of true values ​​of each first perception test case are consistent, and a first test case running report for the training architecture platform and a second test case running report for the vehicle architecture platform are generated to evaluate the correctness of the perception results of the perception model on the training architecture platform and the vehicle architecture platform, respectively; for one or more inconsistent first perception test cases, the output results of the first perception test cases at each node in the perception process are compared frame by frame in the first output record and the second output record to generate a first architecture comparison report to evaluate the consistency of the model perception results under different architectures.

[0132] Optionally, the software perception performance test result generation subunit can be specifically used to execute the software algorithm comparison process: for the third output record and the second true value record, the fourth output record and the second true value record, respectively, according to the perception target category, the number of targets and the number of true values ​​for each second perception test case are statistically analyzed and judged to see if they are consistent, and a third test case running report for the training architecture platform and a fourth test case running report for the vehicle architecture platform are generated to evaluate the correctness of the perception results of the perception software on the training architecture platform and the vehicle architecture platform, respectively; for one or more inconsistent second perception test cases, the output results of the second perception test cases at each node in the perception process are compared frame by frame in the third output record and the fourth output record to generate a second architecture comparison report to evaluate the consistency of the software perception results under different architectures.

[0133] Optionally, the perceived reliability testing device may also include a stability operation report generation unit.

[0134] The stability operation report generation unit is used to collect test logs generated by different architecture platforms running the perception module during automated backfeed testing, parse the test logs, statistically analyze multiple performance indicators related to the perception module, and generate a stability operation report.

[0135] This invention provides a perception reliability testing device. This device is used to: after the perception module to be tested is released, obtain multiple perception test cases related to the test case list generated based on the development and testing requirements and historical testing requirements of the perception module from a data server. The perception module includes mutually compatible perception models and perception software. Each perception test case is input into a perception module running on a different architecture platform. Through automated backfeed testing, the output records of the perception module on each architecture platform are obtained, and the results are compared with the truth records generated from the real labeled data corresponding to the perception test cases to generate perception performance test results. Using the perception performance test results of the perception model and perception software, perception reliability test results are generated. This invention generates a test case list by combining development and historical testing requirements, and relies on automated backfeed testing to input perception test cases on different architecture platforms, obtain outputs, and compare them with real labels. This enables timely and comprehensive performance and reliability evaluation of each architecture platform after the perception module is released, effectively improving the efficiency and comprehensiveness of perception reliability testing.

[0136] Regarding the apparatus in the above embodiments, the specific manner in which each unit performs its operation has been described in detail in the embodiments related to the method, and will not be elaborated upon here.

[0137] The perception reliability testing device includes a processor and a memory. The test case data acquisition unit 10, perception performance test comparison unit 20, and perception reliability test result generation unit 30 are all stored as program units in the memory. The processor executes the program units stored in the memory to achieve the corresponding functions.

[0138] The processor contains a kernel, which retrieves the corresponding program units from memory. One or more kernels can be configured. By adjusting kernel parameters, after the perception module is released, it can automatically select perception test cases covering multiple scenarios from the data server, combining development testing and historical testing requirements. The perception module includes mutually compatible perception models and perception software. The perception test cases are then input into perception modules running on different architecture platforms. Automated backfeedback testing is used to obtain the output results of each platform, which are compared with real labeled data to form performance test results. The performance test results of the perception model and software are then integrated to draw perception reliability test conclusions. This allows for timely and comprehensive reliability verification across multiple platforms after the perception module is released, significantly improving the testing efficiency and coverage of perception reliability testing.

[0139] This invention provides a computer-readable storage medium storing a program thereon, which, when executed by a processor, implements the perceived reliability testing method.

[0140] This invention provides a processor for running a program, wherein the program executes the perceived reliability testing method during runtime.

[0141] like Figure 6 As shown, this embodiment of the invention provides an electronic device 1000, which includes at least one processor 1001, at least one memory 1002 connected to the processor 1001, and a bus 1003. The processor 1001 and the memory 1002 communicate with each other via the bus 1003. The processor 1001 is used to call program instructions in the memory 1002 to execute the aforementioned perceived reliability testing method. The electronic device in this document can be a server, PC, PAD, mobile phone, etc.

[0142] The present invention also provides a computer program product that, when executed on an electronic device, is adapted to perform the initialization steps of a perceived reliability test method.

[0143] This invention is described with reference to flowchart illustrations and / or block diagrams of methods, apparatuses, electronic devices (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable device, generate instructions for implementing the flowchart illustrations and / or block diagrams. Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.

[0144] In a typical configuration, an electronic device includes one or more processors (CPUs), memory, and a bus. The electronic device may also include input / output interfaces, network interfaces, etc.

[0145] Memory may include non-persistent memory in computer-readable media, such as random access memory (RAM) and / or non-volatile memory, like read-only memory (ROM) or flash RAM, and memory includes at least one memory chip. Memory is an example of computer-readable media.

[0146] Computer-readable media includes both permanent and non-permanent, removable and non-removable media that can store information using any method or technology. Information can be computer-readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, CD-ROM, digital versatile optical disc (DVD) or other optical storage, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transferable medium that can be used to store information accessible by a computing device. As defined herein, computer-readable media does not include transient computer-readable media, such as modulated data signals and carrier waves.

[0147] In the description of this invention, it should be understood that if the terms "upper", "lower", "front", "rear", "left" and "right" are used to indicate the orientation or positional relationship based on the orientation or positional relationship shown in the drawings, they are only for the convenience of describing this invention and simplifying the description, and do not indicate or imply that the position or element referred to must have a specific orientation, or be constructed and operated in a specific orientation, and therefore should not be construed as a limitation of this invention.

[0148] It should be noted that, in this document, relational terms such as "first" and "second" are used only to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. It should also be noted that the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such process, method, article, or apparatus. Unless otherwise specified, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes the element.

[0149] Those skilled in the art will understand that embodiments of the present invention can be provided as methods, systems, or computer program products. Therefore, the present invention can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention can take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

[0150] The above are merely embodiments of the present invention and are not intended to limit the invention. Various modifications and variations can be made to the present invention by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principle of the present invention should be included within the scope of the present invention.

Claims

1. A method for testing perceived reliability, characterized in that, include: After the perception module to be tested is released, based on the development and testing requirements and historical testing requirements of the perception module, a test case list is generated, and multiple perception test cases related to the test case list are obtained from the data server. The perception module includes mutually compatible perception models and perception software. Each of the aforementioned perception test cases is input into the perception module running on different architecture platforms. Through automated backfeeding tests, the output records of the perception module on each architecture platform are obtained. The results are then compared with the truth records generated by the real labeled data corresponding to the perception test cases to generate perception performance test results. Using the perception performance test results of the perception model and the perception software, perception reliability test results are generated.

2. The method according to claim 1, characterized in that, After the perception module to be tested is released, based on the development and testing requirements and historical testing requirements of the perception module, a test case list is generated. Multiple perception test cases related to the test case list are then retrieved from the data server, including: After the perception model to be tested is released, based on the development and testing requirements and historical testing requirements of the perception model, a list of model test cases is generated, and multiple perception test cases related to the list of model test cases are obtained from the data server. After the perception software to be tested is released, based on the software test case list generated according to the development and testing requirements and historical testing requirements of the perception software, multiple perception test cases related to the software test case list are obtained from the data server.

3. The method according to claim 2, characterized in that, After the perception model to be tested is released, based on the development and testing requirements and historical testing requirements of the perception model, a list of model test cases is generated. Multiple perception test cases related to the list of model test cases are then retrieved from the data server, including: After the perception model to be tested is released, the self-test data of the scenario development of the perception model during the model development process is associated with the corresponding problems or functional requirements to obtain a first model test case list; a second model test case list that is associated with the first model test case list is retrieved from the data server; a third model test case list of historical model versions is obtained; and multiple perception test cases related to the first model test case list, the second model test case list, and the third model test case list are obtained from the data server. And / or, after the release of the perception software to be tested, based on the software test case list generated according to the development and testing requirements and historical testing requirements of the perception software, the acquisition of multiple perception test cases related to the software test case list from the data server includes: After the perception software to be tested is released, the self-test data of the scenario development during the software development process of the perception software is associated with the corresponding problems or functional requirements to obtain a first software test case list; a second software test case list that is associated with the first software test case list is retrieved from the data server; a third software test case list of historical software versions is obtained; and multiple perception test cases related to the first software test case list, the second software test case list, and the third software test case list are obtained from the data server.

4. The method according to claim 2, characterized in that, The process involves inputting each of the perception test cases into the perception modules running on different architecture platforms. Through automated backfeeding tests, the output records of the perception modules on each architecture platform are obtained. These output records are then compared with the ground truth records generated from the actual labeled data corresponding to the perception test cases to generate perception performance test results, including: Multiple first perception test cases related to the model test case list are respectively input into the perception models running on the training architecture platform and the vehicle architecture platform. Through automated backfeeding test, the first output record of the perception model running on the training architecture platform and the second output record of the perception model running on the vehicle architecture platform are obtained respectively. The model algorithm is compared using the first output record, the second output record and the first ground truth record generated by the first real labeled data corresponding to the first perception test case to generate the model perception performance test results. Multiple second perception test cases related to the software test case list are input into the perception software running on the training architecture platform and the vehicle architecture platform, respectively. Through automated backfeeding tests, the third output record of the perception software running on the training architecture platform and the fourth output record of the perception software running on the vehicle architecture platform are obtained respectively. The software algorithm is compared using the third output record, the fourth output record and the second truth record generated by the second real labeled data corresponding to the second perception test case, and the software perception performance test results are generated.

5. The method according to claim 4, characterized in that, The model algorithm comparison process includes: For the first output record and the first true value record, and the second output record and the first true value record, respectively, according to the perceived target category, the number of targets and the number of true values ​​for each first perception test case are statistically analyzed and determined to be consistent. A first test case run report for the training architecture platform and a second test case run report for the vehicle architecture platform are generated to evaluate the correctness of the perception results of the perception model on the training architecture platform and the vehicle architecture platform, respectively. For one or more inconsistent first perception test cases, the output results of the first perception test cases at each node in the perception process are compared frame by frame in the first output record and the second output record to generate a first architecture comparison report, so as to evaluate the consistency of the perception results of the perception model under different architectures.

6. The method according to claim 4, characterized in that, The software algorithm comparison process includes: For the third output record and the second true value record, and the fourth output record and the second true value record, respectively, according to the perception target category, the number of targets and the number of true values ​​for each second perception test case are statistically analyzed and judged to determine whether they are consistent. A third test case run report for the training architecture platform and a fourth test case run report for the vehicle architecture platform are generated to evaluate the correctness of the perception results of the perception software on the training architecture platform and the vehicle architecture platform, respectively. For one or more inconsistent second perception test cases, the output results of the second perception test cases at each node in the perception process are compared frame by frame in the third output record and the fourth output record to generate a second architecture comparison report, so as to evaluate the consistency of the software perception results under different architectures.

7. The method according to claim 1, characterized in that, Also includes: The test logs generated by different architecture platforms running the perception module during the automated backfeed test are collected, the test logs are parsed, multiple performance indicators related to the perception module are statistically analyzed, and a stability operation report is generated.

8. A perception reliability testing device, characterized in that, include: The test case data acquisition unit, the perception performance test comparison unit, and the perception reliability test result generation unit are all included. The test case data acquisition unit is used to acquire multiple perception test cases related to the test case list generated by the development and testing requirements and historical testing requirements of the perception module after the perception module is released. The perception module includes mutually compatible perception models and perception software. The perception performance test comparison unit is used to input each of the perception test cases into the perception modules running on different architecture platforms. Through automated backfeeding tests, the output records of the perception modules on each architecture platform are obtained respectively, and the algorithm is compared with the truth records generated by the real labeled data corresponding to the perception test cases to generate perception performance test results. The perception reliability test result generation unit is used to generate perception reliability test results using the perception performance test results of the perception model and the perception software.

9. A computer-readable storage medium having a program stored thereon, characterized in that, When the program is executed by the processor, it implements the perceived reliability testing method as described in any one of claims 1 to 7.

10. An electronic device, characterized in that, The electronic device includes at least one processor, at least one memory connected to the processor, and a bus; wherein the processor and the memory communicate with each other through the bus; the processor is used to call program instructions in the memory to execute the perceived reliability testing method as described in any one of claims 1 to 7.