A performance verification platform of an artificial intelligence processor hardware architecture

By building a dynamic evaluation platform that combines hardware description languages ​​and high-level programming languages, the problems of complex scenario configuration and evaluation lag in the performance evaluation of AI processor hardware architecture are solved, achieving efficient performance evaluation and a stable verification environment, and optimizing the chip development process.

CN122240412APending Publication Date: 2026-06-19SHANGHAI SUIYUAN TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
SHANGHAI SUIYUAN TECH CO LTD
Filing Date
2026-05-22
Publication Date
2026-06-19

Smart Images

  • Figure CN122240412A_ABST
    Figure CN122240412A_ABST
Patent Text Reader

Abstract

This invention discloses a performance verification platform for AI processor hardware architecture, comprising a test stimulus pool, a performance data pool, an evaluation platform, and a stimulus import layer. The test stimulus pool stores test cases for multiple scenarios. The stimulus import layer loads target test cases from the test stimulus pool and imports the target data items required for execution of the target test cases into the evaluation platform. After translating the target test cases into standard data structures, the target operation units within them are deployed to the evaluation platform. The evaluation platform, through the combined efforts of the hardware description language world and the high-level programming language world, tests the hardware under test based on the target test cases and outputs the performance data from each output port of the hardware under test to the performance data pool for storage. The technical solution of this invention can be generalized to meet the performance evaluation needs of AI processor hardware architectures for different test scenarios under different types of platforms.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of hardware testing technology, and in particular to a performance verification platform for an artificial intelligence (AI) processor hardware architecture. Background Technology

[0002] With the continuous development of integrated circuits and semiconductor technology, the integration and complexity of AI processors are constantly increasing, and the difficulty and requirements for evaluating their hardware architecture performance are also increasing.

[0003] Hardware architecture performance evaluation is often challenged by factors such as difficulties in configuring complex scenarios, lengthy simulation times, inconvenient performance data collection, high costs of iterative development and upgrades, and tight project timelines. Traditional performance evaluation methods are typically conducted after the AI ​​processor hardware tape-out (also known as pre-silicon tape-out), which results in evaluation delays and higher risks.

[0004] Therefore, how to efficiently complete the performance evaluation of AI processor hardware architecture before AI processor hardware tape-out is an important problem that needs to be solved. Summary of the Invention

[0005] This invention provides a performance verification platform for artificial intelligence processor hardware architecture, which can generalize to meet the performance evaluation needs of AI processor hardware architecture for different test scenarios under different types of platforms through a novel hardware architecture performance evaluation framework.

[0006] According to one aspect of the present invention, a performance verification platform for an artificial intelligence processor hardware architecture is provided, including a test stimulus pool, a performance data pool, an evaluation platform, and a stimulus import layer. The evaluation platform includes a hardware description language world and a high-level programming language world; the hardware description language world includes the hardware under test.

[0007] The stimulus import layer is connected to both the test stimulus pool and the evaluation platform. The hardware description language world and the high-level programming language world are connected through a communication pipeline built based on a direct programming interface. The evaluation platform is connected to the performance data pool.

[0008] The test incentive pool is used to store test cases for multiple scenarios organized according to a unified framework structure.

[0009] The incentive import layer is used to load the target test cases to be deployed from the test incentive pool, import the target data items required for the execution of the target test cases into the evaluation platform according to the predefined data framework structure in the evaluation platform, so that the hardware description language world and the high-level programming language world can share the target data items through the communication channel; and after translating the target test cases into a standard data structure adapted to the evaluation platform, deploy the target operation units contained in the target test cases to the evaluation platform based on the standard data structure.

[0010] The evaluation platform is used to test the hardware under test based on each target data item and each target operation unit included in the target test cases through the joint efforts of the hardware description language world and the high-level programming language world, and outputs the performance data output by each output port of the hardware under test to the performance data pool.

[0011] The performance data pool is used to store various performance data output by the hardware under test during the performance testing process.

[0012] Based on the above embodiments, the hardware description language world includes multiple stimulus queues, a hardware description language performance verification environment, a chip status bulletin board, and a first data center. The hardware under test is configured in the hardware description language performance verification environment. The high-level programming language world includes a high-level programming language test main process and a second data center.

[0013] Each incentive queue is connected to the incentive import layer and the hardware description language performance verification environment, respectively. The hardware description language performance verification environment is connected to the chip status bulletin board, which is connected to the first data center. The first and second data centers share data through communication channels.

[0014] The incentive import layer is further used to distribute each target operation unit to each incentive queue, and to store each target data item in the first data center and synchronize it to the second data center via the communication channel;

[0015] The chip status bulletin board is used to acquire and summarize various status information of the hardware under test, and update the status information in real time.

[0016] The hardware description language performance verification environment is used to combine the real-time updated status information in the chip status bulletin board, concurrently obtain each target operation unit from each stimulus queue, and execute each target operation unit on the hardware under test based on each target data item stored in the first data center through collaborative processing with the high-level programming language test main process, and update the chip status bulletin board with the status information generated during the execution process.

[0017] The high-level programming language test main process is used to respond to the trigger calls of the hardware description language performance verification environment. It combines the real-time updated status information in the chip status bulletin board and the target data items stored in the second data center to assist the hardware description language performance verification environment in executing each target operation unit. Based on the execution results, it updates the second data center and synchronizes the updated content to the first data center through the communication channel.

[0018] Based on the above embodiments, the hardware description language world also includes a first application programming interface library, and the high-level programming language world also includes a second application programming interface library;

[0019] The first application programming interface (API) library is connected to the first data center, and the second API library is connected to both the high-level programming language test main process and the second data center. The first API library and the second API library are connected via a direct programming interface for data sharing.

[0020] The first application programming interface library is used to store various first application programming interfaces developed based on the hardware description language, store the first status information obtained by the hardware description language performance verification environment after calling the first application programming interfaces in real time, and update the first status information to the chip status bulletin board and the first data center; and store the status information updated in real time in the chip status bulletin board, and use the second application programming interface library as an intermediary to synchronize the status information updated in real time in the chip status bulletin board to the high-level programming language test main process;

[0021] The second application programming interface library is used to store various second application programming interfaces developed based on high-level programming languages. It stores the second status information obtained by the high-level programming language test main process calling the second application programming interfaces in real time, and updates the second status information to the second application programming interface library and the second data center. The second status information is synchronized to the chip status bulletin board using the first application programming interface library as an intermediary.

[0022] Based on the above embodiments, the high-level programming language test main process is further used for:

[0023] In response to the call of the hardware description language performance verification environment, the system combines the target data items stored in the second data center with the real-time updated status information in the chip status bulletin board indirectly obtained through the second application programming interface library to assist the hardware description language performance verification environment in executing each target operation unit, and updates the second application programming interface library and the second data center according to the execution results.

[0024] Based on the above embodiments, the hardware description language performance verification environment includes multiple master agents, one master agent is connected to an incentive queue, one process runs in one master agent, and an integrated entity consisting of callback functions, sequencers, monitoring collectors and drivers is executed in one process.

[0025] The sequencer is connected to the excitation queue, the monitoring collector is connected to a port of the hardware under test, the driver is connected to the input port of the hardware under test and / or the set circuit element inside the hardware under test, and the callback function is bound to the monitoring collector.

[0026] The sequencer is used to store target operation units distributed via the incentive queue and to distribute the target operation units to the driver;

[0027] The driver is used to combine the target data items stored in the first data center and the status information updated in real time in the chip status bulletin board, and generate multiple target transactions that match the target operation unit by cooperating with the high-level programming language test main process, and inject each target transaction into the hardware under test to perform the test.

[0028] The monitoring collector is used to trigger the collection of performance data on the configured port whenever a target transaction is detected to have finished executing.

[0029] The callback function is used to collect third-state data on the configured port through the bound monitoring collector when a preset event triggering condition is detected, and update the third-state information to the chip status bulletin board and the first data center.

[0030] Based on the above embodiments, the high-level programming language test main process includes multiple parallel threads;

[0031] Each thread is used to respond to the triggering of the process running in the master agent, and combines the target data items stored in the second data center with the real-time updated status information in the chip status bulletin board to assist the driver in the master agent in generating the target transaction.

[0032] Based on the above embodiments, the process state of the process running in the master agent includes any one of the following: startup state, verification and refresh state, trigger state, and release state.

[0033] Based on the above embodiments, the incentive import layer is further used for:

[0034] Based on the target data items stored in the first data center, the target operation units stored in each excitation queue are pre-sorted; and

[0035] The sequencer is further configured to dynamically schedule each target operation unit stored in its own memory to the driver of the master control agent based on the execution status of the target operation unit by the master control agent.

[0036] Based on the above embodiments, the evaluation platform further includes: a performance analysis script; the performance analysis script is connected to the monitoring collector and the performance data pool in each master control agent;

[0037] The monitoring collector is further used to collect status data on the configured port and then output the collection results to the log.

[0038] The performance analysis script is used to process the logs generated by each monitoring collector to obtain a multi-dimensional hardware architecture performance evaluation report, which is then output to the performance data pool for storage.

[0039] Based on the above embodiments, the performance verification platform also includes a performance data center, which is connected to the performance data pool;

[0040] The performance data center is used to obtain performance data of the hardware under test from the performance data pool, classify it according to the scenario, and then display it in a set data format.

[0041] Based on the above embodiments, the unified framework structure of the test cases stored in the test stimulus pool includes chip configuration information, operation unit configuration information corresponding to each operation unit, execution scheduling information between operation units, and complete test case configuration information, wherein:

[0042] The operation unit configuration information includes: operation unit execution mode configuration information, operation unit definition information, operation unit data path list, and transaction list matching the operation unit.

[0043] Based on the above embodiments, the communication pipeline includes multiple pre-encapsulated transport functions for high-speed transport of data of various data structures between the hardware description language world and the high-level programming language world.

[0044] Based on the above embodiments, the evaluation platform is a register-transfer level simulation evaluation platform, the hardware description language world is a register-transfer level hardware description language world, and the high-level programming language world is a C language or C++ language world.

[0045] The technical solution of this invention constructs a dynamic evaluation platform based on hybrid programming of hardware description languages ​​and high-level programming languages. It establishes information interaction between the hardware description language world and the high-level programming language world through a communication channel built based on a direct programming interface, realizing the maximum reuse of application programming interfaces (APIs) and test stimuli between different programming language worlds. It can be generalized to evaluate the hardware architecture performance of AI processors of different types of evaluation platforms and different application scenarios, effectively optimizing the difficulty of chip development, improving the stability of the verification environment, and thus maximizing the efficiency of project development and the accuracy of performance evaluation.

[0046] It should be understood that the description in this section is not intended to identify key or essential features of the embodiments of the present invention, nor is it intended to limit the scope of the invention. Other features of the invention will become readily apparent from the following description. Attached Figure Description

[0047] To more clearly illustrate the technical solutions in the embodiments of the present invention, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0048] Figure 1 This is a schematic diagram of the structure of a performance verification platform for an AI processor hardware architecture provided in an embodiment of the present invention;

[0049] Figure 2 This is a schematic diagram of the structure of an evaluation platform provided according to an embodiment of the present invention;

[0050] Figure 3 This is a schematic diagram of the structure of another evaluation platform provided according to an embodiment of the present invention;

[0051] Figure 4 This is a schematic diagram of the structure of a hardware description language performance verification environment provided in an embodiment of the present invention;

[0052] Figure 5 This is a schematic diagram of the structure of another evaluation platform provided according to an embodiment of the present invention;

[0053] Figure 6 This is a schematic diagram of the structure of a performance verification platform for another AI processor hardware architecture provided in an embodiment of the present invention;

[0054] Figure 7This is a flowchart illustrating a performance verification method for an AI processor hardware architecture based on a register-transfer level simulation evaluation platform, applicable to embodiments of the present invention. Detailed Implementation

[0055] To enable those skilled in the art to better understand the present invention, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present invention. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort should fall within the scope of protection of the present invention.

[0056] It should be noted that the terms "first," "second," etc., in the specification, claims, and accompanying drawings of this invention are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of the invention described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.

[0057] Figure 1 This is a schematic diagram of the structure of a performance verification platform for an AI processor hardware architecture provided in an embodiment of the present invention. This embodiment can be applied to the situation where a performance test is performed on a set hardware under a set scenario based on an evaluation platform of a set type.

[0058] like Figure 1 As shown, the performance verification platform includes: a test stimulus pool 110, a performance data pool 120, an evaluation platform 130, and a stimulus import layer 140. The evaluation platform 130 includes a hardware description language world 1301 and a high-level programming language world 1302; the hardware description language world 1301 includes the hardware under test 13011.

[0059] In this embodiment, the evaluation platform 130 may include: an architecture model (Amodel) evaluation platform, a hardware simulation (Emulation) evaluation platform, an RTL (Register Transfer Level) simulation (Simulation) evaluation platform, and a post-silicon verification (Postsilicon) evaluation platform. As mentioned above, when applied to pre-silicon performance testing scenarios, the evaluation platform 130 may specifically be an Emulation evaluation platform or a Simulation evaluation platform.

[0060] Accordingly, when the evaluation platform 130 is a register-transfer level simulation evaluation platform, the hardware description language world 1301 can be a register-transfer level hardware description language world (System Verilog world, or simply SV world), and the high-level programming language world 1302 can be a C language or a C++ language world (or simply C world or C++ world). When the evaluation platform 130 is a hardware simulation evaluation platform, the hardware description language world 1301 can be a register-transfer level hardware description language world, and the high-level programming language world can be a Python language world (or simply Python world).

[0061] The hardware under test 13011, also known as the device under test (DUT), refers to the actual device, circuit, or system used for various tests such as functionality and performance during the testing and verification process. In other words, it is the AI ​​processor hardware architecture that needs to be evaluated and verified in this embodiment of the invention.

[0062] The stimulus import layer 140 is connected to the test stimulus pool 110 and the evaluation platform 130, respectively. The hardware description language world 1301 and the high-level programming language world 1302 are connected through a communication pipe 1303 built based on the Direct Programming Interface (DPI). The evaluation platform 130 is connected to the performance data pool 120, wherein:

[0063] Test stimulus pool 110 is used to store test cases for multiple scenarios organized according to a unified framework structure.

[0064] Among them, the test incentive pool 110 can be understood as a storage container for test incentives. It defines a unified incentive carrying framework internally, centrally stores all test incentives from different platforms, projects and versions, and is the input source of the evaluation platform 130.

[0065] Specifically, this unified framework abstracts complex test scenarios into multiple operations (ops) and classifies and stores them according to preset classification rules, achieving standardized and structured management of incentives and providing a foundation for subsequent full-process invocation.

[0066] In an optional implementation of this embodiment, the unified framework structure of the test cases (also referred to as jobSpec) stored in the test stimulus pool includes chip configuration information, such as hardware resource configuration information (chipCfg) and register configuration information (chipRegCfg), as well as operation unit configuration information corresponding to each operation unit, execution scheduling information between operation units, and complete test case (i.e., test case) configuration information.

[0067] The operation unit configuration information includes: operation unit execution mode configuration information (opX_cfg), operation unit definition information (opX_def), operation unit data path list (opX_dps), and transaction list matching the operation unit (opX_trs).

[0068] Accordingly, Table 1 shows a unified framework structure for an optional test case.

[0069] Table 1

[0070]

[0071] The incentive import layer 140 is used to load the target test cases to be deployed from the test incentive pool 110, import the target data items required for the execution of the target test cases into the evaluation platform 130 according to the predefined data framework structure in the evaluation platform 130, so that the hardware description language world 1301 and the high-level programming language world 1302 can share the target data items through the communication pipe 1303; and after translating the target test cases into a standard data structure adapted to the evaluation platform 130, deploy the target operation units contained in the target test cases to the evaluation platform 130 based on the standard data structure.

[0072] In this embodiment, the stimulus import layer 140 first loads the target test cases to be deployed from the test stimulus pool 110 during the stimulus loading phase (also known as the Load phase), and imports the target test cases into the predefined data framework structure in the evaluation platform 130. That is, the target data items required for the execution of the target test cases are imported into the preset storage space in the evaluation platform 130 according to the preset data structure to ensure the recognizability of each target data item by each component (or node) in the evaluation platform 130.

[0073] Furthermore, a transmission mechanism between any nodes needs to be established in the evaluation platform 130 to ensure the full coverage and stability of each target data item in the target test case in the evaluation platform 130. That is, to ensure that the hardware description language world 1301 and the high-level programming language world 1302 share each target data item through the communication channel 1303.

[0074] Furthermore, the stimulus import layer 140 is also used in the stimulus transfer phase (also known as the Parser phase) to translate the target test cases into a standard data structure adapted to the evaluation platform 130. Since the performance verification platform of the AI ​​processor hardware architecture in this embodiment can be adapted to various types of evaluation platforms 130, and different types of evaluation platforms 130 can recognize different data structures, the stimulus import layer 140 is also used to translate the target test cases into a standard data structure.

[0075] In a specific example, the stimulus import layer 140 can import the hardware resource configuration information, register configuration information, execution scheduling information between operation units, and complete test case configuration information from the target test case as target data items into the evaluation platform 130. The operation unit configuration information in the target test case is then translated into a standard data structure.

[0076] Furthermore, the incentive import layer 140 is also used in the incentive deployment phase (also known as the Deploy phase) to reshape the standard data structure of the target test cases obtained in the incentive translation phase, in combination with the existing simulation parameter deployment rules in the evaluation platform 130, according to the business type and deployment process, to obtain multiple ops (i.e., target operation units) that can be executed by the evaluation platform 130, and to deploy the above multiple ops in the evaluation platform 130. Among them, one target operation unit can correspond to one task to be executed, and one target operation unit can include multiple target transactions.

[0077] The evaluation platform 130 is used to test the hardware under test based on each target data item and each target operation unit included in the target test cases through the joint cooperation of the hardware description language world 1301 and the high-level programming language world 1302, and output the performance data output by each output port of the hardware under test 13011 to the performance data pool 120.

[0078] The Hardware Description Language World 1301 can also be referred to as the Hardware Description Language Development Environment. Taking the System Verilog World as an example, the Hardware Description Language World 1301 is built on the System Verilog language and includes various components formed by connecting them through UVM (Universal Verification Methodology). It is one of the core environments for initialization, scheduling, and monitoring in the evaluation platform 130.

[0079] Correspondingly, the High-Level Programming Language World 1302 can also be called a high-level programming language development environment. Taking the C world as an example, the High-Level Programming Language World 1302 is built on the C language and includes various integrated application programming interface libraries and C testing processes. It is one of the core environments for short-term and fast testing and data processing in the evaluation platform 130.

[0080] In the evaluation platform 130, a communication connection is established between the hardware description language world 1301 and the high-level programming language world 1302 via a communication channel 1303. This communication channel 1303 is constructed based on a direct programming interface. Specifically, when the hardware description language world 1301 is the SV world and the high-level programming language world 1302 is the C world, the direct programming interface is DPI-C.

[0081] DPI is the standard official interface for direct function calls between the hardware description language world 1301 and the high-level programming language world 1302. In this embodiment, DPI is not directly used to connect the hardware description language world 1301 and the high-level programming language world 1302 because this interface has limited bandwidth and is mainly used for transmitting control signals, configuration, or small batches of data. This embodiment aims to enable high-speed exchange or sharing of target data items for the target test cases between the hardware description language world 1301 and the high-level programming language world 1302, effectively reducing the time required for AI processor architecture performance evaluation. Therefore, a communication pipeline (also known as a perf tunnel or p tunnel) based on DPI is creatively designed.

[0082] In this embodiment, the communication pipeline includes multiple pre-encapsulated transport functions for high-speed transport of data with various data structures between the hardware description language world and the high-level programming language world.

[0083] Taking the Hardware Description Language World 1301 as the SV world and the High-Level Programming Language World 1302 as the C world as an example, the communication channel connecting the SV world and the C world is implemented through the low-level functions of DPI-C. Specifically, multiple new functions can be defined through the low-level logic of DPI-C (the functions themselves have optimizations for handling data structures, handling complex arrays and structures at once) to achieve bidirectional fast transfer of various data structures between the SV world and the C world. These functions, used to transfer the specified data structures from the C world to the SV world, and from the SV world to the C world, are equivalent to establishing a bidirectional data transmission communication channel.

[0084] Through the aforementioned high-speed data transmission communication channel, each target data item in the target test case imported by the stimulus import layer 140 can be stored separately in the SV world and the C world. This avoids the frequent direct calls to DPI-C to obtain data from the SV world during the execution of the thread in the C world, effectively avoiding a large amount of simulation overhead in the C world and effectively accelerating the simulation in the C world.

[0085] Performance data pool 120 is used to store various performance data output by the hardware under test during the performance test.

[0086] Among them, the performance data pool 120 can be understood as a storage container for performance data. It contains multi-dimensional information required for performance analysis, including bandwidth, latency, number of incomplete transactions (also known as outstanding number), balance, task start time, task end time, and number of cycles, etc. It is the carrier of the output results of the evaluation platform 130.

[0087] The technical solution of this invention constructs a dynamic simulation platform based on hybrid programming of hardware description languages ​​and high-level programming languages. It establishes information interaction between the hardware description language world and the high-level programming language world through a communication channel built based on a direct programming interface, realizing the maximum reuse of application programming interfaces and test stimuli between different programming language worlds. It can be generalized to evaluate the hardware architecture performance of AI processors of different types of evaluation platforms and different application scenarios, effectively optimizing the difficulty of chip development, improving the stability of the verification environment, and thus maximizing the efficiency of project development and the accuracy of performance evaluation.

[0088] Furthermore, Figure 2 This is a schematic diagram of the structure of an evaluation platform provided in an embodiment of the present invention. Figure 2As shown, in this evaluation platform, the hardware description language world specifically includes multiple stimulus queues 210, a hardware description language performance verification environment 220, a chip status bulletin board 230, and a first data center 240. The hardware under test is configured in the hardware description language performance verification environment, and the high-level programming language world includes a high-level programming language test main process 250 and a second data center 260.

[0089] Each incentive queue 210 is connected to the incentive import layer and the hardware description language performance verification environment 220, respectively. The hardware description language performance verification environment 220 is connected to the chip status bulletin board 230, which is also connected to the first data center 240. The first data center 240 and the second data center 260 share data through a communication channel.

[0090] The incentive import layer is further used to distribute each target operation unit to each incentive queue 210, and to store each target data item in the first data center 240 and synchronize it to the second data center 260 via the communication channel.

[0091] The chip status bulletin board 230 is used to acquire and summarize various status information of the hardware under test and update the status information in real time.

[0092] In this embodiment, the chip status bulletin board 230 is used to summarize the status information of all hardware physical resources occupied, task execution status, pending task status, and information collection status in the hardware under test, and to update the status information in real time. The chip status bulletin board 230 can serve as data support for finding available resources during task scheduling.

[0093] Specifically, the various status information required in the chip status bulletin board 230 can be obtained from the hardware description language performance verification environment 220, or indirectly from the high-level programming language test main process 250 via data sharing between the first data center 240 and the second data center 260.

[0094] Furthermore, when the incentive import layer distributes each target operation unit to each incentive queue 210, it can also refer to the real-time updated status information in the chip status bulletin board 230 (not shown in the connection diagram).

[0095] The hardware description language performance verification environment 220 is used to combine the real-time updated status information in the chip status bulletin board 230, concurrently obtain each target operation unit from each stimulus queue 210, and execute each target operation unit on the hardware under test based on each target data item stored in the first data center 240 through collaborative processing with the high-level programming language test main process 250, and update the chip status bulletin board 230 with the status information generated during the execution process.

[0096] The high-level programming language test main process 250 is used to respond to the trigger call of the hardware description language performance verification environment 220, combine the real-time updated status information in the chip status bulletin board 230 and the target data items stored in the second data center 260, assist the hardware description language performance verification environment 220 in executing each target operation unit, update the second data center 260 according to the execution result, and synchronize the updated content to the first data center 240 through the communication channel.

[0097] Among them, the high-level programming language test main process 250 belongs to the test process of the high-level programming language world. It can call the API of the high-level programming language to perform tests and update the chip status bulletin board 230 in real time. The high-level programming language test main process 250 can be understood as a sub-process of the hardware description language world. As the simulation progresses, it will poll all hardware physical resources of the hardware under test.

[0098] It is understandable that this communication channel ensures that the data stored in the first data center 240 and the second data center 260 is the same, meaning that the two data centers are periodically synchronized. The first data center 240 and the second data center 260 can be further understood as storage centers for the data framework structure of each test case, which is globally accessible, recognizable by the currently adapted evaluation platform, and can be directly deployed.

[0099] Furthermore, Figure 3 This is a schematic diagram of the structure of another evaluation platform provided according to an embodiment of the present invention. Figure 3 As shown, the hardware description language world also includes a first application programming interface library 310, and the high-level programming language world also includes a second application programming interface library 320.

[0100] The first application programming interface (API) library 310 is connected to the first data center, and the second API library 320 is connected to both the high-level programming language test main process and the second data center. The first API library 310 and the second API library 320 are connected via a direct programming interface for data sharing.

[0101] The first application programming interface library 310 is used to store various first application programming interfaces developed based on the hardware description language, store the first status information obtained by the hardware description language performance verification environment after calling the first application programming interface in real time, and update the first status information to the chip status bulletin board and the first data center; and store the status information updated in real time in the chip status bulletin board, and use the second application programming interface library 320 as an intermediary to synchronize the status information updated in real time in the chip status bulletin board to the high-level programming language test main process.

[0102] In this optional embodiment, the first application programming interface library 310 is used to store all developed functions and tasks available for use in the hardware description language world, and is associated with the second application programming interface library 320 through a direct programming interface (typically DPI-C).

[0103] The second application programming interface library 320 is used to store various second application programming interfaces developed based on high-level programming languages. It stores the second status information obtained by the high-level programming language test main process calling the second application programming interfaces in real time, and updates the second status information to the second application programming interface library 320 and the second data center. The second status information is synchronized to the chip status bulletin board using the first application programming interface library 310 as an intermediary.

[0104] In this embodiment, the second application programming interface library 320 is used to integrate and store various APIs developed based on high-level programming languages ​​(typically C or C++), and is connected to the first application programming interface library 310 through a direct programming interface.

[0105] In this embodiment, considering that the amount of data generated by each interface call result (i.e., status information) in response to the calls in the first application programming interface library 310 and the second application programming interface library 320 is relatively small, information sharing can be directly achieved through the standard direct programming interface. This setup effectively reduces the information transmission path between the hardware description language world and the high-level programming language world during status information sharing, further improving simulation efficiency.

[0106] Correspondingly, the main process for testing high-level programming languages ​​can be further used for:

[0107] In response to the call of the hardware description language performance verification environment, the system combines the target data items stored in the second data center with the real-time updated status information in the chip status bulletin board indirectly obtained through the second application programming interface library 320, assists the hardware description language performance verification environment in executing each target operation unit, and updates the second application programming interface library 320 and the second data center according to the execution results.

[0108] As mentioned above, after the high-level programming language test main process completes the update of the second application programming interface library 320, the second application programming interface library 320 can update the second status information generated by the second application programming interface call to the chip status bulletin board via the first application programming interface library 310.

[0109] Furthermore, in Figure 4 The diagram illustrates the structure of a hardware description language performance verification environment provided by an embodiment of the present invention. Figure 4 As shown, the hardware description language performance verification environment includes multiple master agents 410. Each master agent 410 is connected to an incentive queue. Each master agent 410 runs a process (not shown in the figure). The process executes an integrated entity consisting of a callback function 4101 (also called callback), a sequencer 4102 (also called sequencer), a monitor collector 4103 (also called monitor), and a driver 4104 (also called driver).

[0110] The core functions of the master control agent 410 include task distribution, information collection, and resource management, which are connected to the incentive queue and the hardware under test, respectively.

[0111] The sequencer 4102 is connected to the excitation queue, the monitor collector 4103 is connected to a port of the hardware under test, the driver 4104 is connected to the input port of the hardware under test and / or the setting circuit element inside the hardware under test, and the callback function 4101 is bound to the monitor collector 4103.

[0112] The sequencer 4102 is used to store target operation units distributed via the incentive queue and to distribute the target operation units to the driver 4104.

[0113] In this embodiment, the sequencer 4102 can combine the real-time updated status information in the chip status bulletin board to distribute the target operation unit to the driver 4104 at an appropriate time.

[0114] The driver 4104 is used to combine the target data items stored in the first data center and the status information updated in real time in the chip status bulletin board, and generate multiple target transactions that match the target operation unit by cooperating with the high-level programming language test master process, and inject each target transaction into the hardware under test to perform the test.

[0115] The target operation unit includes multiple target transactions. Specifically, an operation unit can be understood as a complete top-level operation, such as an AI matrix calculation, a complete read / write operation, or a data transfer task. A transaction can be understood as the smallest task request that the interface can recognize, which is separated from the operation unit, such as a read transaction, a write transaction, or a parameter configuration transaction.

[0116] In this optional embodiment, when the driver 4104 is directly connected to the input port of the hardware under test, the target transaction extracted from the target operation unit can be directly sent to the setting input port of the hardware under test through a front-door test; or, when the driver 4104 is connected to the setting circuit element (e.g., various registers) inside the hardware under test, the target transaction extracted from the target operation unit can be directly applied to the matching circuit element through a back-door test.

[0117] The monitoring collector 4103 is used to trigger the collection of performance data on the configured port whenever a target transaction is detected to have finished executing.

[0118] In this embodiment, a monitoring collector 4103 is mounted on each port (e.g., input port, output port, or bidirectional port) of the hardware under test. The monitoring collector 4103 is used to trigger the collection of performance data on the configured port after the target transaction sent by the driver 4104 in its respective process has been executed. The performance data collected by the monitoring collector 4103 is ultimately output to a performance data pool for storage.

[0119] The callback function 4101 is used to collect third state data on the configured port through the bound monitoring collector 4103 when a preset event triggering condition is detected, and update the third state information to the chip status bulletin board and the first data center.

[0120] In this embodiment, the callback function 4101 is bound to the monitoring collector 4103 in its process, and a preset event (i.e., an event) is predefined in the callback function 4101 to trigger its execution, such as completing 20 target transactions consecutively, or completing the execution of a specific target transaction. When the event triggering condition defined in the callback function 4101 is met, the callback function 4101 will trigger the monitoring collector 4103 bound to it to collect third-state data on the configured port. The third-state information will be updated to the chip status bulletin board and the first data center. Finally, the third-state information will be updated to the second application programming interface library and the second data center via the first application programming interface library, so that the high-level programming language testing main process can be aware of it. Based on the above embodiments, the incentive import layer (not shown in the figure) can be further used for:

[0121] Based on the target data items stored in the first data center, the target operation units stored in each excitation queue are pre-sorted; and

[0122] The sequencer is further configured to dynamically schedule each target operation unit stored in its own memory to the driver of the master control agent based on the execution status of the target operation unit by the master control agent.

[0123] As mentioned earlier, the target data item stores the execution flow and scheduling dependency information between operation units. Based on the above information, the incentive import layer can first pre-sort the target operation units stored in each incentive queue. By pre-sorting the target operation units stored in the incentive queue, the allocation of the static deployment queue is completed, laying the foundation for subsequent dynamic scheduling and improving the orderliness of scheduling.

[0124] Then, the sequencer can dynamically schedule each target operation unit stored in its own memory to the driver of the master agent based on the execution status of the target operation unit by the master agent.

[0125] Furthermore, the process states of the processes running in the master control agent can be predefined to include any of the following: launch, check & flush, kick off, and release. Correspondingly, when the sequencer checks the process state of each master control agent at any given time, the processes running in each master control agent must belong to one of these four states. Accordingly, the sequencer can dynamically schedule the target operation units distributed via the incentive queue to the driver based on the pre-sorted static deployment queue, according to the current process state of each master control agent. Through the above settings, dynamic incentive scheduling throughout the entire process is achieved, enabling real-time matching of incentive scheduling with hardware operating states, thus improving scheduling efficiency and flexibility.

[0126] Furthermore, in Figure 5 The diagram shows a structural schematic of another evaluation platform provided by an embodiment of the present invention. Figure 5 As shown, the main process of the high-level programming language test includes multiple parallel threads 510.

[0127] Each thread (510) is used to respond to the triggering of the process running in the master agent, and to assist the driver in the master agent to generate the target transaction by combining the target data items stored in the second data center and the real-time updated status information in the chip status bulletin board.

[0128] In this embodiment, multiple threads 510 can be pre-created in the high-level programming language test main process. When a process A is started in a master agent, process A can send a trigger instruction to the high-level programming language test main process through the first application programming interface library and the second application programming interface library as intermediaries. This causes the high-level programming language test main process to allocate a thread 510 to process A, so that one or more second application programming interfaces can be triggered in the thread 510 to assist the driver in the master agent in generating the target transaction.

[0129] Furthermore, in Figure 6 The diagram illustrates the structure of a performance verification platform for another AI processor hardware architecture provided in an embodiment of the present invention. Figure 6 As shown, the evaluation platform also includes: a performance analysis script 610; the performance analysis script 610 is connected to the monitoring collector and performance data pool in each master control agent;

[0130] The monitoring collector is further used to collect status data on the configured port and then output the collection results to the log.

[0131] The performance analysis script is used to process the logs generated by each monitoring collector to obtain a multi-dimensional hardware architecture performance evaluation report, which is then output to the performance data pool for storage.

[0132] Based on the above embodiments, the performance verification platform further includes a performance data center 620, which is connected to the performance data pool;

[0133] The Performance Data Center 620 is used to obtain performance data of the hardware under test from the performance data pool, classify the scenarios, and then display the data in a set format.

[0134] In this embodiment, the performance data center 620 can first classify the performance data stored in the performance data pool according to scenarios, and then summarize and organize it before storing it. Furthermore, it can also display the data to designated AI processor developers in a set data format (e.g., webpage or PDF).

[0135] The preceding content primarily describes the performance verification platform for AI processor hardware architecture from the perspective of deploying physical layer infrastructure. This performance verification platform can be understood as a hybrid simulation platform built upon hardware description languages ​​(typically System Verilog) and high-level programming languages ​​(typically C / C++). It inputs test stimuli (or test cases) with a unified framework from a test stimulus pool (typically a Stimulus pool). Within the evaluation platform (typically a Simulation evaluation platform, or simply Simulation), these stimuli are translated into directly deployable data structures adapted to the current evaluation platform. These data structures are then simultaneously stored in two data centers: one in the hardware description language world and the other in the high-level programming language world. Simultaneously, based on the real-time updated status information in the chip status bulletin, the deployable data structures are distributed to all deployable hardware resources. Through the collaborative processing of the parallel processes in the SV world and the C world, the performance data results related to the hardware architecture are finally obtained.

[0136] Between the SV world and the C world, this embodiment of the invention establishes a bidirectional communication channel using DPI-C technology. This bidirectional channel enables mutual calls between the developed and integrated application programming interfaces (APIs) of the two worlds. Simultaneously, this bidirectional communication channel integrates data transmission channels corresponding to different data structures, achieving data synchronization between the SV world and the C world and ensuring that both worlds share the same data structure in the same data center. This embodiment of the invention treats the C world as a process within the SV world and incorporates it into the SV world for unified scheduling. As each simulation time node progresses in the SV world, the main process of the C world polls all hardware components through multiple parallel threads. Through the aforementioned bidirectional communication channel, the running and post-running states of the C world process are synchronously updated in real time to the chip status bulletin board in the SV world. From a performance optimization perspective, this embodiment of the invention optimizes the scheduling of transactions (or traffic) on any port of the hardware architecture from both time and space dimensions, ensuring efficient and rational traffic transmission.

[0137] Meanwhile, in each embodiment of the present invention, a monitoring collector is mounted on any port of the hardware architecture. This monitoring collector can collect the performance data of the corresponding port in real time and output the collected performance data to the log. Subsequently, the performance data in the log is processed by a self-developed performance analysis script to generate a multi-dimensional hardware architecture performance evaluation report (Perf Result). After the performance evaluation report is classified according to the scenario, it is finally uploaded to the performance data center for storage, so as to facilitate subsequent query, call and secondary analysis.

[0138] The core of the aforementioned physical layer infrastructure is the evaluation platform. In its execution, this evaluation platform can be divided into four stages: task import, task scheduling, task execution, and task analysis. Each stage is further subdivided into different execution steps based on functional requirements. In this embodiment, taking a register-transfer level simulation evaluation platform as an example, the complete execution flow of the performance verification method for AI processor hardware architecture is described in detail.

[0139] Correspondingly, in Figure 7 The diagram illustrates a detailed flowchart of a performance verification method for an AI processor hardware architecture based on a register-transfer level simulation evaluation platform (Simulate), applicable to an embodiment of the present invention. Figure 7 As shown, the entire performance verification method includes four stages: task import, task scheduling, task execution, and task analysis.

[0140] (I) Task Introduction Phase

[0141] The core innovation of this phase lies in building a unified incentive framework and decoupling mechanism to achieve standardized import and flexible adaptation of incentives for different scenarios. The specific steps are as follows:

[0142] 1. Stimulus Storage: The Stimulus Pool employs a unified stimulus framework to centrally store all test stimuli (also known as test cases or Case_JobSpecs) from different projects, platforms, and versions. Each test case corresponds to a specific complex test scenario.

[0143] Meanwhile, test cases for complex test scenarios are abstracted into multiple ops (i.e., op1, ..., opn), and classified and stored according to the classification rules shown in Case_JobSpec, so as to realize the standardized and structured management of test incentives and provide a foundation for subsequent full-process calls.

[0144] 2. Stimulus Loading (i.e., Load): Load one or more Case_JobSpecs, import all data items contained in the Case_JobSpec into the predefined data framework of the SV world, and establish a transmission mechanism that allows stimuli to be distributed to any node in the evaluation environment, ensuring the full coverage and stability of stimulus transmission.

[0145] 3. Incentive translation (i.e., Parser): Translate each op in Case_JobSpec and parse it into a data structure that the current evaluation platform (i.e., Simulation) can directly recognize. The key innovation of this step is that the test incentives and hardware architecture changes are decoupled through this translation process, which effectively avoids the impact of hardware architecture iteration on incentive development and adaptation work, and reduces the cost of repeated development.

[0146] 4. Incentive Deployment: Based on the standard data structure translated in the previous step, and combined with the existing simulation parameter deployment rules in the evaluation platform, the test incentives are shaped according to business type and deployment process to build an incentive queue that can be directly deployed. The incentives are pushed into the queue to provide a standardized container interface for subsequent incentive scheduling and execution, ensuring the standardization and scalability of incentive deployment.

[0147] (II) Task Scheduling Phase

[0148] The core innovation of this phase lies in adopting a collaborative strategy of "static pre-scheduling + dynamic scheduling," combined with multi-process parallel design, to improve incentive scheduling efficiency and adapt to the dynamic operating state of the hardware architecture. The specific steps are as follows:

[0149] 1. Queue pre-sorting (i.e., in Pre-Load): Based on the deployable incentive queue built in the task import phase, and combined with the static dependencies between incentives (which are explicitly described in the ops beforehand), the incentive queue is pre-sorted to complete the allocation of the static deployment queue, laying the foundation for subsequent dynamic scheduling and improving the orderliness of scheduling.

[0150] 2. Dynamic scheduling (also known as Arbitration) is based on a static deployment queue and uses a preset scheduling strategy to divide the incentive scheduling task into four parallel states: launch process, check & flush, kick off, and release. At the same time, the execution status of incentives on the current hardware architecture is collected in real time, and the scheduling strategy is dynamically adjusted according to the status feedback to complete the entire process of dynamic incentive scheduling. This achieves real-time matching between incentive scheduling and hardware operating status, improving scheduling efficiency and flexibility.

[0151] (III) Task Execution Phase

[0152] The core innovation of this phase lies in achieving multi-path incentive delivery, policy-free priority scheduling, and full-dimensional state collection, ensuring the accuracy of incentive execution and the comprehensiveness of state monitoring. The specific steps are as follows:

[0153] 1. Incentive Push and Status Statistics: Through front-door or back-door methods, business instructions or system parameter configurations are completed, enabling accurate push of performance configurations and test incentives on the evaluation platform; at the same time, the execution status of incentives during platform operation is statistically analyzed and observed in real time, and the status data is reported to the chip status bulletin in real time, providing data support for dynamic scheduling.

[0154] 2. Incentive Execution Control: Reuse the existing execution components of the evaluation platform to complete the push control of transaction-level incentives; for transactions from the SV world or C world, adopt the first-come-first-served principle without strategy to drive the RTL (Register Transfer Level) circuit in Simulate, ensuring the real-time performance and accuracy of incentive execution and avoiding execution delays caused by strategy intervention.

[0155] 3. Execution Status Collection: Monitors collect execution data for each transaction and, based on preset event definitions, statistically analyze the completeness of information across various dimensions of the stimulus execution to ensure comprehensive collection of stimulus execution status and provide a complete data foundation for subsequent performance analysis.

[0156] During the task execution phase, multiple threads (i.e., C process 1, ..., C process n) are started in the C world. The lifecycle of each thread (creation, scheduling, and destruction) is managed by the Software Station. The SV world assists in executing test stimuli by calling various second application programming interfaces (APIs) stored in the API library (i.e., the second application programming interface library). In the SV world, multiple processes (i.e., SV process 1, ..., SV process n) are started. The C world and the SV world establish a real-time communication connection through a communication tunnel (p Tunnel).

[0157] (iv) Task Analysis Phase

[0158] The core innovation of this phase lies in achieving standardized processing, multi-dimensional analysis, and shared management of evaluation data, ensuring the reliability and reusability of performance evaluation results. The specific steps are as follows:

[0159] 1. Data Output: All data related to hardware architecture performance collected by Monitors will be printed to the Log Print file in Simulate according to a preset format to ensure data integrity and traceability.

[0160] 2. Multi-dimensional analysis: The Analysis Script is invoked to automatically process and analyze the performance data in the log files, generating a multi-dimensional hardware architecture performance analysis report. The report covers core evaluation indicators such as stimulus execution bandwidth, balance, and transaction processing latency, ensuring the comprehensiveness and professionalism of the analysis results.

[0161] 3. Report Management and Sharing: The generated performance analysis reports are categorized and summarized according to the test scenarios, uploaded to the database, and provided with access interfaces in the form of web pages for other relevant teams to query and reuse, thereby realizing the shared management of evaluation results and improving the utilization value of evaluation data.

[0162] Through the coordinated execution of the above four stages, a complete performance evaluation execution process for AI processor hardware architecture is constructed. Its core advantages are:

[0163] By decoupling the test incentives from the hardware architecture, standardizing incentive management and deployment mechanisms, implementing multi-process dynamic scheduling, and conducting full-process data collection and analysis, this method enables minimal-cost adaptation based on its robust framework, regardless of hardware architecture changes caused by project iterations, maintenance and upgrades of test incentives, or development and deployment between different evaluation platforms. Ultimately, it efficiently obtains stable, reliable, and comprehensive hardware architecture system performance evaluation results, thus solving the pain points of traditional evaluation methods.

[0164] 1. By adopting a unified incentive framework and test incentive pool, incentives are decoupled from hardware and standardized management is achieved, solving the pain points of traditional scenarios being strongly bound to hardware, high iteration costs, and complex test case development, thereby improving incentive reusability and development efficiency.

[0165] 2. By decoupling scenario development through a unified incentive framework, the risk of modifying physical layer infrastructure and the maintenance cost are reduced, allowing developers to focus on scenario development, liberating productivity, and solving the pain points of developers' scattered energy and inefficient maintenance.

[0166] 3. The execution process is designed in four stages and layered, with independent log printing to achieve accurate status monitoring, improve the efficiency of anomaly location, and solve the problems of cumbersome anomaly location and high iteration costs in traditional methods.

[0167] 4. Introduce a cross-platform compatibility mechanism to achieve multi-system and multi-evaluation platform adaptation, solving the pain points of poor cross-platform compatibility and high secondary development costs of traditional frameworks.

[0168] 5. Based on DPI-C, a bidirectional pipeline is built to realize data synchronization, API interoperability and unified scheduling between the SV world and the C world, and solve the limitations of the two worlds in collaborative scheduling, data communication, thread control and simulation speed.

[0169] 6. By adopting a "static pre-schedule + dynamic scheduling" and multi-process parallel design, we can achieve efficient incentive push and full-dimensional status collection, solving pain points such as inefficient scheduling and execution delay.

[0170] 7. Optimize the deployment process of the evaluation platform to achieve standardized deployment of incentives and configurations. The process is simple, stable and reliable. It has been verified by multiple generations of projects and solves the pain points of traditional framework deployment, such as cumbersome deployment, poor stability and insufficient adaptability.

[0171] 8. The incentive framework is scientifically constructed, and the performance configuration file (chipRegCfg) is stored independently, making the configuration traceable and manageable. It is easy to organize into a standardized performance guidance manual, solving the pain points of traditional configuration files being messy, difficult to trace, and cumbersome manual organization.

[0172] 9. By using analysis scripts to automate data analysis and share reports, standardized processes are built to address the pain points of inefficient performance analysis and difficulty in collaboration.

[0173] It should be understood that the various forms of processes shown above can be used, with steps reordered, added, or deleted. For example, the steps described in this invention can be executed in parallel, sequentially, or in different orders, as long as the desired result of the technical solution of this invention can be achieved, and this is not limited herein.

[0174] The specific embodiments described above do not constitute a limitation on the scope of protection of this invention. Those skilled in the art should understand that various modifications, combinations, sub-combinations, and substitutions can be made according to design requirements and other factors. Any modifications, equivalent substitutions, and improvements made within the spirit and principles of this invention should be included within the scope of protection of this invention.

Claims

1. A performance verification platform for an artificial intelligence processor hardware architecture, characterized in that, It includes a test stimulus pool, a performance data pool, an evaluation platform, and a stimulus import layer. The evaluation platform includes a hardware description language world and a high-level programming language world; the hardware description language world includes the hardware under test. The stimulus import layer is connected to both the test stimulus pool and the evaluation platform. The hardware description language world and the high-level programming language world are connected through a communication pipeline built based on a direct programming interface. The evaluation platform is connected to the performance data pool. The test incentive pool is used to store test cases for multiple scenarios organized according to a unified framework structure. The incentive import layer is used to load the target test cases to be deployed from the test incentive pool, import the target data items required for the execution of the target test cases into the evaluation platform according to the predefined data framework structure in the evaluation platform, so that the hardware description language world and the high-level programming language world can share the target data items through the communication channel; and after translating the target test cases into a standard data structure adapted to the evaluation platform, deploy the target operation units contained in the target test cases to the evaluation platform based on the standard data structure. The evaluation platform is used to test the hardware under test based on each target data item and each target operation unit included in the target test cases through the joint efforts of the hardware description language world and the high-level programming language world, and outputs the performance data output by each output port of the hardware under test to the performance data pool. The performance data pool is used to store various performance data output by the hardware under test during the performance testing process.

2. The performance verification platform for the artificial intelligence processor hardware architecture according to claim 1, characterized in that, The hardware description language world includes multiple stimulus queues, a hardware description language performance verification environment, a chip status bulletin board, and a first data center. The hardware under test is configured in the hardware description language performance verification environment. The high-level programming language world includes a high-level programming language test main process and a second data center. Each incentive queue is connected to the incentive import layer and the hardware description language performance verification environment, respectively. The hardware description language performance verification environment is connected to the chip status bulletin board, which is connected to the first data center. The first and second data centers share data through communication channels. The incentive import layer is further used to distribute each target operation unit to each incentive queue, and to store each target data item in the first data center and synchronize it to the second data center via the communication channel; The chip status bulletin board is used to acquire and summarize various status information of the hardware under test, and update the status information in real time. The hardware description language performance verification environment is used to combine the real-time updated status information in the chip status bulletin board, concurrently obtain each target operation unit from each stimulus queue, and execute each target operation unit on the hardware under test based on each target data item stored in the first data center through collaborative processing with the high-level programming language test main process, and update the chip status bulletin board with the status information generated during the execution process. The high-level programming language test main process is used to respond to the trigger calls of the hardware description language performance verification environment. It combines the real-time updated status information in the chip status bulletin board and the target data items stored in the second data center to assist the hardware description language performance verification environment in executing each target operation unit. Based on the execution results, it updates the second data center and synchronizes the updated content to the first data center through the communication channel.

3. The performance verification platform for the artificial intelligence processor hardware architecture according to claim 2, characterized in that, The world of hardware description languages ​​also includes a first application programming interface (API) library, and the world of high-level programming languages ​​also includes a second application programming interface (API) library. The first application programming interface (API) library is connected to the first data center, and the second API library is connected to both the high-level programming language test main process and the second data center. The first API library and the second API library are connected via a direct programming interface for data sharing. The first application programming interface library is used to store various first application programming interfaces developed based on the hardware description language, store the first status information obtained by the hardware description language performance verification environment after calling the first application programming interfaces in real time, and update the first status information to the chip status bulletin board and the first data center; and store the status information updated in real time in the chip status bulletin board, and use the second application programming interface library as an intermediary to synchronize the status information updated in real time in the chip status bulletin board to the high-level programming language test main process; The second application programming interface library is used to store various second application programming interfaces developed based on high-level programming languages. It stores the second status information obtained by the high-level programming language test main process calling the second application programming interfaces in real time, and updates the second status information to the second application programming interface library and the second data center. The second status information is synchronized to the chip status bulletin board using the first application programming interface library as an intermediary.

4. The performance verification platform for the artificial intelligence processor hardware architecture according to claim 3, characterized in that, The main process for testing high-level programming languages ​​is further used for: In response to the call of the hardware description language performance verification environment, the system combines the target data items stored in the second data center with the real-time updated status information in the chip status bulletin board indirectly obtained through the second application programming interface library to assist the hardware description language performance verification environment in executing each target operation unit, and updates the second application programming interface library and the second data center according to the execution results.

5. The performance verification platform for the artificial intelligence processor hardware architecture according to claim 2, characterized in that, The hardware description language performance verification environment includes multiple master agents, each master agent is connected to an incentive queue, and a process runs in each master agent. In a process, an integrated entity consisting of callback functions, sequencers, monitoring collectors, and drivers is executed. The sequencer is connected to the excitation queue, the monitoring collector is connected to a port of the hardware under test, the driver is connected to the input port of the hardware under test and / or the set circuit element inside the hardware under test, and the callback function is bound to the monitoring collector. The sequencer is used to store target operation units distributed via the incentive queue and to distribute the target operation units to the driver; The driver is used to combine the target data items stored in the first data center and the status information updated in real time in the chip status bulletin board, and generate multiple target transactions that match the target operation unit by cooperating with the high-level programming language test main process, and inject each target transaction into the hardware under test to perform the test. The monitoring collector is used to trigger the collection of performance data on the configured port whenever a target transaction is detected to have finished executing. The callback function is used to collect third-state data on the configured port through the bound monitoring collector when a preset event triggering condition is detected, and update the third-state information to the chip status bulletin board and the first data center.

6. The performance verification platform for the artificial intelligence processor hardware architecture according to claim 5, characterized in that, The main process for testing high-level programming languages ​​includes multiple parallel threads; Each thread is used to respond to the triggering of the process running in the master agent, and combines the target data items stored in the second data center with the real-time updated status information in the chip status bulletin board to assist the driver in the master agent in generating the target transaction.

7. The performance verification platform for the artificial intelligence processor hardware architecture according to claim 5, characterized in that, The process states of processes running in the master agent include any of the following: startup state, verification and refresh state, trigger state, and release state.

8. The performance verification platform for the artificial intelligence processor hardware architecture according to claim 5, characterized in that, The incentive import layer is further used for: Based on the target data items stored in the first data center, the target operation units stored in each excitation queue are pre-sorted; and The sequencer is further configured to dynamically schedule each target operation unit stored in its own memory to the driver of the master control agent based on the execution status of the target operation unit by the master control agent.

9. The performance verification platform for the artificial intelligence processor hardware architecture according to claim 5, characterized in that, The evaluation platform also includes: performance analysis scripts; these performance analysis scripts are connected to the monitoring collectors and performance data pools in each master control agent; The monitoring collector is further used to collect status data on the configured port and then output the collection results to the log. The performance analysis script is used to process the logs generated by each monitoring collector to obtain a multi-dimensional hardware architecture performance evaluation report, which is then output to the performance data pool for storage.

10. The performance verification platform for the artificial intelligence processor hardware architecture according to any one of claims 1-9, characterized in that, The performance verification platform also includes a performance data center, which is connected to the performance data pool. The performance data center is used to obtain performance data of the hardware under test from the performance data pool, classify it according to the scenario, and then display it in a set data format.

11. The performance verification platform for the artificial intelligence processor hardware architecture according to any one of claims 1-9, characterized in that, The unified framework structure of test cases stored in the test stimulus pool includes chip configuration information, operation unit configuration information corresponding to each operation unit, execution scheduling information between operation units, and complete test case configuration information, among which: The operation unit configuration information includes: operation unit execution mode configuration information, operation unit definition information, operation unit data path list, and transaction list matching the operation unit.

12. The performance verification platform for the artificial intelligence processor hardware architecture according to any one of claims 1-9, characterized in that, The communication pipeline includes multiple pre-packaged transport functions for high-speed transport of data from various data structures between the hardware description language world and the high-level programming language world.

13. The performance verification platform for the artificial intelligence processor hardware architecture according to any one of claims 1-9, characterized in that, The evaluation platform is a register-transfer level simulation evaluation platform, the hardware description language world is a register-transfer level hardware description language world, and the high-level programming language world is a C language or a C++ language world.