Model computational efficiency testing method, system, apparatus and device, medium and product
By using a model computation efficiency testing method, the lack of performance evaluation for model computation systems was addressed, enabling quantitative and comparability assessments of model computation systems and reducing application costs.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- INSPUR SUZHOU INTELLIGENT TECH CO LTD
- Filing Date
- 2025-12-03
- Publication Date
- 2026-06-18
AI Technical Summary
Existing technologies lack performance testing and evaluation metrics for model computing systems, making it difficult to quantitatively assess model computing performance.
A method for testing model computation efficiency is provided. This method involves acquiring a dataset, controlling the device under test to run the software framework under test to call the model under test to perform model computation tasks, determining model performance parameters, data throughput parameters, and computation performance parameters, and then calculating the model computation efficiency.
It enables quantitative evaluation of model computing systems, enhances the comparability of different models on different devices, provides a reference for improving models and model computing systems, and reduces application costs.
Smart Images

Figure CN2025139739_18062026_PF_FP_ABST
Abstract
Description
A method, system, apparatus, equipment, medium, and product for testing computational efficiency.
[0001] Cross-reference to related applications
[0002] This application claims priority to Chinese Patent Application No. 202411824012.6, filed on December 11, 2024, entitled "A method, system, apparatus, device, medium and product for testing computational efficiency", the entire contents of which are incorporated herein by reference. Technical Field
[0003] This application relates to the field of artificial intelligence technology, and in particular to a method, system, device, equipment, medium and product for testing computational efficiency. Background Technology
[0004] With the rapid development of artificial intelligence (AI) technology, AI models are exhibiting trends such as large data processing volumes and high computational complexity. The computational demands on hardware devices for model calculation tasks are increasing rapidly, requiring significant resource consumption. Currently, evaluation metrics for model computation performance are typically focused on the model itself, such as model accuracy. Related technologies lack evaluation metrics for the model computation system comprised of the hardware and software used to perform model calculations, as well as performance testing methods for such systems.
[0005] How to test and quantify the model computing performance of a model computing system is a technical problem that needs to be solved by those skilled in the art. Summary of the Invention
[0006] The purpose of this application is to provide a model computation efficiency testing method, system, device, equipment, medium, and product for testing and evaluating the model computation performance of a model computation system.
[0007] To address the aforementioned technical problems, this application provides a simulation efficiency testing method, comprising:
[0008] Obtain the dataset;
[0009] Control the device under test to run the software framework under test to call the model under test and dataset to perform model calculation tasks;
[0010] Determine the model performance parameters, model data throughput parameters, and computational performance parameters based on the model computation task;
[0011] The computational efficiency of the tested model computing system is determined based on the model performance parameters, model data throughput parameters, and computational performance parameters, so as to obtain the test results of the tested model computing system.
[0012] The tested model computing system includes the tested model, the tested software framework, and the tested equipment.
[0013] On the one hand, the computational efficiency of the tested model's computing system is determined based on model performance parameters, model data throughput parameters, and computational performance parameters, including:
[0014] Determine the effective throughput evaluation parameters of the computing system of the tested model based on the model performance parameters and the model data throughput parameters;
[0015] The simulation efficiency is determined based on the effective throughput evaluation parameters and computational performance parameters.
[0016] On the other hand, the larger the effective throughput corresponding to the effective throughput evaluation parameter, the greater the computational efficiency.
[0017] On the other hand, the model data throughput parameter includes the amount of computation per unit of model data;
[0018] The computational cost per unit model data is the computational cost required for the tested model to process a unit of input data.
[0019] The better the model performance corresponding to the model performance parameter, the larger the effective throughput evaluation parameter; the effective throughput evaluation parameter is negatively correlated with the computational cost per unit of model data.
[0020] On the other hand, the model under test is a language model, and the unit model data computation is the amount of computation required for the model under test to process each input tag.
[0021] On the other hand, the model under test is an image processing model, and the unit model data computation is the amount of computation required for the model under test to process each input image.
[0022] On the other hand, model data throughput parameters include throughput rate;
[0023] Throughput is the rate at which the tested model processes input data per unit time.
[0024] The better the model performance corresponding to the model performance parameter, the larger the effective throughput evaluation parameter; the effective throughput evaluation parameter is positively correlated with the throughput rate.
[0025] On the other hand, the model under test is a language model, and the throughput is the number of input tokens processed by the model under test per unit time.
[0026] On the other hand, the model under test is an image processing model, and the throughput is the number of input images processed by the model under test per unit time.
[0027] On the other hand, the computational efficiency of the tested model's computing system is determined based on model performance parameters, model data throughput parameters, and computational performance parameters, including:
[0028] If the model performance parameters are within the range of the baseline model performance parameters, then the model computation efficiency is determined based on the model performance parameters, model data throughput parameters, and computation performance parameters.
[0029] If the model performance parameters are not within the range of the baseline model performance parameters, then set the model performance parameters to 0, and determine the model computation efficiency based on the model performance parameters, model data throughput parameters, and computation performance parameters.
[0030] On the other hand, the model performance parameter is the model accuracy of the tested model, which is determined by the following formula:
[0031] Among them, the measured accuracy of the model is the measured accuracy of the tested model in the model calculation task, and the baseline accuracy is the minimum allowable accuracy of the tested model.
[0032] On the other hand, computational performance parameters include computational efficiency;
[0033] Computational efficiency is positively correlated with the measured computational performance parameters of the computing system of the tested model when performing model computation tasks, and negatively correlated with the theoretical peak performance parameters of the computing system of the tested device.
[0034] The computational performance parameters are positively correlated with computational efficiency.
[0035] On the other hand, the steps for determining the measured performance parameters include:
[0036] Obtain the measured total computational load and the measured total time spent by the tested model computing system in executing the model computing task.
[0037] The measured calculation performance parameters are determined based on the measured total calculation amount and measured total time.
[0038] On the other hand, the measured computational performance parameter is the number of floating-point operations per second performed by the tested device to execute the model calculation task;
[0039] The theoretical peak performance parameter of the computing system is the theoretical peak number of floating-point operations per second of the tested device.
[0040] On the other hand, the steps for determining the measured performance parameters and the steps for determining the theoretical peak performance parameters of the calculation system include:
[0041] Obtain the theoretical peak performance parameters of the computing system corresponding to the first data type of the device under test;
[0042] If all actual data types in the model computation task are of the first data type, then the computational performance parameters corresponding to the actual data types in the model computation task shall be used as the measured computational performance parameters.
[0043] If there is an actual data type in the model calculation task that is not the first data type, then the calculation performance parameters corresponding to the actual data type in the model calculation task are equivalent to the calculation performance parameters corresponding to the first data type, and the measured calculation performance parameters are obtained.
[0044] On the other hand, the steps for determining the measured performance parameters and the steps for determining the theoretical peak performance parameters of the calculation system include:
[0045] Obtain the theoretical peak performance parameters of the computing system corresponding to the second data type of the device under test;
[0046] If all actual data types in the model computation task are the second data type, then the computational performance parameters corresponding to the actual data types in the model computation task shall be used as the measured computational performance parameters.
[0047] If there is an actual data type in the model calculation task that is not the second data type, then the calculation performance parameters corresponding to the actual data type in the model calculation task are converted into the calculation performance parameters corresponding to the second data type to obtain the measured calculation performance parameters.
[0048] On the other hand, the computational efficiency of the tested model's computing system is determined based on the model's performance parameters, data throughput parameters, and computational performance parameters, and is calculated using the following formula:
[0049] Here, model accuracy is a model performance parameter; unit model data computation is the computational amount required for the tested model to process a unit of input data; and computational efficiency is a computational performance parameter.
[0050] On the other hand, the computational efficiency of the tested model's computing system is determined based on the model's performance parameters, data throughput parameters, and computational performance parameters, and is calculated using the following formula:
[0051] Among them, model accuracy is a model performance parameter; measured computation per input label is a model data throughput parameter, representing the amount of computation required by the tested model to process each input label; computational efficiency is a computational performance parameter.
[0052] On the other hand, the computational efficiency of the tested model's computing system is determined based on the model's performance parameters, data throughput parameters, and computational performance parameters, and is calculated using the following formula:
[0053] Among them, model accuracy is the model performance parameter; unit model data computation is the computational amount required for the tested model to process a unit of input data; the ratio of the measured computational performance parameter of the tested model computing system to the theoretical peak performance parameter of the tested equipment computing system represents the computational efficiency, which is the computational performance parameter.
[0054] On the other hand, the computational efficiency of the tested model's computing system is determined based on the model's performance parameters, data throughput parameters, and computational performance parameters, and is calculated using the following formula:
[0055] Among them, model accuracy is the model performance parameter; unit model data computation is the computation required for the tested model to process a unit of input data; the ratio of the measured total computation to the measured total time of the model computation task is the measured computation performance parameter of the tested model computing system executing the model computation task; the ratio of the measured computation performance parameter to the theoretical peak performance parameter of the tested device computing system represents the computation efficiency, and computation efficiency is the computation performance parameter.
[0056] On the other hand, the computational efficiency of the tested model's computing system is determined based on the model's performance parameters, data throughput parameters, and computational performance parameters, and is calculated using the following formula:
[0057] Among them, model accuracy is the model performance parameter; unit model data computation is the computation required for the tested model to process a unit of input data; the ratio of the product of unit model data computation and the total number of input data to the measured total time of the model computation task is the measured computation performance parameter of the tested model computing system executing the model computation task; the ratio of the measured computation performance parameter to the theoretical peak performance parameter of the tested device's computing system represents the computation efficiency, which is the computation performance parameter.
[0058] On the other hand, the computational efficiency of the tested model's computing system is determined based on the model's performance parameters, data throughput parameters, and computational performance parameters, and is calculated using the following formula:
[0059] Among them, model accuracy is the model performance parameter; the ratio of the total number of input data to the total measured time is the model data throughput parameter; and the theoretical peak performance parameter of the computing system is the computing performance parameter.
[0060] On the other hand, the computational efficiency of the tested model's computing system is determined based on the model's performance parameters, data throughput parameters, and computational performance parameters, and is calculated using the following formula:
[0061] Among them, model accuracy is a model performance parameter; throughput is a model data throughput parameter, representing the rate at which the tested model processes input data per unit time; and the theoretical peak performance parameter of the computing system is a computing performance parameter.
[0062] On the other hand, the computational efficiency of the tested model computing system is determined based on model performance parameters, model data throughput parameters, and computational performance parameters to obtain the test results of the tested model computing system, including:
[0063] Based on the computational efficiency of the tested model computing system under multiple test scenarios, the overall computational efficiency of the tested model computing system is determined.
[0064] The overall simulation efficiency was used as the test result;
[0065] The test scenarios correspond one-to-one with the datasets.
[0066] On the other hand, the computational efficiency of the tested model computing system is determined based on model performance parameters, model data throughput parameters, and computational performance parameters to obtain the test results of the tested model computing system, including:
[0067] Based on the computational efficiency of the tested model computing system under multiple test scenarios, the overall computational efficiency of the tested model computing system is determined.
[0068] The overall simulation efficiency was used as the test result;
[0069] One test scenario corresponds to one or more datasets.
[0070] On the other hand, the steps for determining the computational efficiency for a single test scenario include:
[0071] If a test scenario corresponds to a dataset, then the simulation efficiency corresponding to the dataset is used as the simulation efficiency corresponding to the test scenario.
[0072] If a test scenario corresponds to multiple datasets, the simulation efficiency of the test scenario is determined based on the simulation efficiency of each dataset.
[0073] On the other hand, the computational efficiency of the test scenario is determined based on the computational efficiency of each dataset corresponding to the test scenario, including:
[0074] The arithmetic mean of the computational efficiency for each dataset is used as the computational efficiency for the test scenario.
[0075] On the other hand, the computational efficiency of the test scenario is determined based on the computational efficiency of each dataset corresponding to the test scenario, including:
[0076] The geometric mean of the computational efficiency for each dataset is used as the computational efficiency for the test scenario.
[0077] On the other hand, the steps for determining the computational efficiency for a single test scenario include:
[0078] If a test scenario corresponds to a dataset, then the simulation efficiency corresponding to the dataset is used as the simulation efficiency corresponding to the test scenario.
[0079] If a test scenario corresponds to multiple datasets, then the single-scenario model performance parameters for the test scenario are determined based on the model performance parameters of each dataset; the single-scenario model data throughput parameters for the test scenario are determined based on the model data throughput parameters of each dataset; and the single-scenario computational performance parameters for the test scenario are determined based on the computational performance parameters of each dataset.
[0080] The simulation efficiency corresponding to the test scenario is determined based on the single-scenario model performance parameters, single-scenario model data throughput parameters, and single-scenario computation performance parameters.
[0081] On the other hand, the single-scene model performance parameters corresponding to the test scenario are determined based on the model performance parameters corresponding to each dataset corresponding to the test scenario, including: using the arithmetic mean of the model performance parameters corresponding to each dataset as the single-scene model performance parameters.
[0082] The single-scene model data throughput parameters for the test scenario are determined based on the model data throughput parameters of each dataset corresponding to the test scenario, including: using the arithmetic mean of the model data throughput parameters of each dataset as the single-scene model data throughput parameters.
[0083] The single-scene computational performance parameters for the test scenario are determined based on the computational performance parameters of each dataset corresponding to the test scenario, including: using the arithmetic mean of the computational performance parameters of each dataset as the single-scene computational performance parameter.
[0084] On the other hand, the single-scene model performance parameters corresponding to the test scenario are determined based on the model performance parameters corresponding to each dataset corresponding to the test scenario, including: using the geometric mean of the model performance parameters corresponding to each dataset as the single-scene model performance parameters;
[0085] The single-scene model data throughput parameters corresponding to the test scenario are determined based on the model data throughput parameters corresponding to each dataset. This includes using the geometric mean of the model data throughput parameters corresponding to each dataset as the single-scene model data throughput parameters.
[0086] The single-scene computational performance parameters for the test scenario are determined based on the computational performance parameters of each dataset corresponding to the test scenario, including: using the geometric mean of the computational performance parameters of each dataset as the single-scene computational performance parameter.
[0087] On the other hand, based on the computational efficiency of the tested model computing system in multiple test scenarios, the overall computational efficiency of the tested model computing system is determined, including:
[0088] The arithmetic mean of the simulation efficiency corresponding to multiple test scenarios is used as the comprehensive simulation efficiency.
[0089] On the other hand, the arithmetic mean of the simulation efficiency corresponding to multiple test scenarios is used as the comprehensive simulation efficiency, which is calculated by the following formula:
[0090] Among them, MCE t MCE represents the overall computational efficiency. i f represents the computational efficiency of the tested model computing system in the i-th test scenario. 1i This represents the first weight coefficient of the tested model calculation system in the i-th test scenario.
[0091] On the other hand, based on the computational efficiency of the tested model computing system in multiple test scenarios, the overall computational efficiency of the tested model computing system is determined, including:
[0092] The geometric mean of the simulation efficiency across multiple test scenarios is used as the overall simulation efficiency.
[0093] On the other hand, the geometric mean of the simulation efficiency corresponding to multiple test scenarios is used as the comprehensive simulation efficiency, which is calculated by the following formula:
[0094] Among them, MCE t MCE represents the overall computational efficiency. i The Π(·) represents the computational efficiency of the tested model computation system in the i-th test scenario, where f represents the cumulative multiplication calculation. 2i This represents the second weight coefficient of the tested model's calculation system in the i-th test scenario.
[0095] On the other hand, controlling the device under test to run the software framework under test to invoke the model under test and dataset to perform model computation tasks includes:
[0096] The software framework under test is loaded onto the device under test and its initialization is performed.
[0097] The software framework under test is started, and the model under test and dataset are invoked to perform model calculation tasks.
[0098] On the other hand, the device under test loads the software framework under test and performs initialization of the software framework under test, including:
[0099] Load the library files of the computing system of the model under test onto the device under test;
[0100] The type of the interface function of the computing system of the tested model is obtained by calling the interface of the library file;
[0101] Initialize the interface function according to its type.
[0102] On the other hand, the types of interface functions include at least: initialization functions, completion functions, data loading functions, data unloading functions, model calculation control functions, and test parameter return functions;
[0103] The initialization function is the interface function that performs initialization based on the test configuration information; the completion function is the interface function that performs the task after the tested model calculation system has completed the model calculation task; the data loading function is the interface function that loads the model calculation data; the data unloading function is the interface function that unloads the model calculation data; the model calculation control function is the interface function that performs the model calculation task; and the test parameter return function outputs the test parameters required for calculating at least one of the following: model performance parameters, model data throughput parameters, and calculation performance parameters.
[0104] On the other hand, the software framework under test is started to call the model under test to perform model calculation tasks, including:
[0105] Start the software framework under test to perform model calculation tasks;
[0106] Obtain the first result returned by the test parameter return function after the software framework under test calls the data loading function to load the model and calculate the data;
[0107] Obtain the second result of the function returned by the test parameters after the software framework under test calls the model calculation control function to execute the model calculation task;
[0108] Waiting for the third result to be output after the software framework under test has completed the model calculation task;
[0109] The test software framework calls the data unloading function to unload the model's calculated data and then calls the test parameter return function to return the fourth result.
[0110] The model accuracy of the tested model is calculated based on the execution results of the model calculation task, including:
[0111] Calculate the model accuracy based on at least one of the first, second, third, and fourth results.
[0112] To address the aforementioned technical problems, this application also provides a simulation efficiency testing system, including testing equipment and the device under test;
[0113] The device under test is used to run the software framework under test to perform model calculation tasks based on the model under test and the dataset.
[0114] The testing equipment is used to acquire datasets and control the device under test to run the software framework under test to call the model under test and the dataset to execute model calculation tasks; determine model performance parameters, model data throughput parameters and calculation performance parameters based on the model calculation tasks; determine the model calculation efficiency of the model calculation system under test based on the model performance parameters, model data throughput parameters and calculation performance parameters, so as to obtain the test results of the model calculation system under test;
[0115] The tested model computing system includes the tested model, the tested software framework, and the tested equipment.
[0116] To address the aforementioned technical problems, this application also provides a testing device for simulation efficiency, comprising:
[0117] The acquisition unit is used to acquire the dataset;
[0118] The control unit is used to control the device under test to run the software framework under test in order to call the model under test and the dataset to perform model calculation tasks.
[0119] The determination unit is used to determine the model performance parameters, model data throughput parameters, and computational performance parameters based on the model computation task; and to determine the model computation efficiency of the tested model computation system based on the model performance parameters, model data throughput parameters, and computational performance parameters, so as to obtain the test results of the tested model computation system.
[0120] The tested model computing system includes the tested model, the tested software framework, and the tested equipment.
[0121] To address the aforementioned technical problems, this application also provides a testing device for simulation efficiency, comprising:
[0122] Memory, used to store computer programs;
[0123] A processor is used to execute computer programs. When a computer program is executed by a processor, it implements the steps of any of the above-mentioned simulation efficiency test methods.
[0124] To address the aforementioned technical problems, this application also provides a non-volatile readable storage medium storing a computer program thereon, which, when executed by a processor, implements the steps of any of the above-described analog computing efficiency testing methods.
[0125] To address the aforementioned technical problems, this application also provides a computer program product, including a computer program that, when executed by a processor, implements the steps of any of the above-described simulation efficiency testing methods.
[0126] The simulation efficiency testing method provided in this application has the advantage of determining model performance parameters, data throughput parameters, and computational performance parameters based on the dataset after controlling the tested software framework to call the tested model to execute model computation tasks on the tested device. The simulation efficiency of the tested model computation system is then determined based on these parameters, allowing for a quantitative evaluation of the effect of increased computing power on model performance. This enables a quantitative assessment of the model computation performance of the tested model computation system, including the tested model, the tested software framework, and the tested device. The simulation efficiency and testing method provided in this application, as an evaluation scheme applicable to various model computation tasks, enhance the comparability of different models running on different devices, providing a reference for the improvement and refinement of models and model computation systems. Furthermore, it allows for a quantitative evaluation of the accuracy and multi-dimensional computing power performance of different models, thereby ensuring effective guidance for improving model capabilities and reducing application costs.
[0127] The simulation efficiency testing system, simulation efficiency testing device, simulation efficiency testing equipment, non-volatile readable storage medium, and computer program product provided in this application have the aforementioned beneficial effects, which will not be elaborated further here. Attached Figure Description
[0128] To more clearly illustrate the technical solutions of the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0129] Figure 1 is a flowchart of a simulation efficiency testing method provided in an embodiment of this application;
[0130] Figure 2 is a timing diagram of a simulation efficiency testing method provided in an embodiment of this application;
[0131] Figure 3 is a schematic diagram of a simulation efficiency testing system provided in an embodiment of this application;
[0132] Figure 4 is a schematic diagram of the structure of a simulation efficiency testing device provided in an embodiment of this application. Detailed Implementation
[0133] The core of this application is to provide a model computation efficiency testing method, system, device, equipment, medium, and product for testing and evaluating the model computation performance of a model computation system.
[0134] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.
[0135] To facilitate understanding of the technical solutions provided in the embodiments of this application, some key terms used in the embodiments of this application will be explained here first.
[0136] Computing power: Used to measure the ability of a computing system (such as a computer, server, data center, etc.) to process information and perform calculations. The most commonly used unit is floating point operations per second (FLOPS).
[0137] Model Computational Efficiency: A scheme for evaluating the efficiency of large models, reflecting the overall efficiency of the tested model under hardware system and software framework. The evaluation objects are the model itself, the tested software framework, and the tested AI bare metal server (computing power base). It is a comprehensive evaluation of the above three types of performance factors.
[0138] Floating point operations per second (FLOPS) is a metric that measures the processing power of a computer system or computing device. It refers to the number of floating-point operations (addition, subtraction, multiplication, division, etc.) a system or device can perform per unit of time.
[0139] Actual computational performance (unit can be TFLOPS (Tera Floating-point Operations Per Second): the amount of computation performed per unit time by the statistical model in addition, subtraction, multiplication, division and other floating-point operations during inference and training.
[0140] Actual computational total (unit can be TFLOPS): The total amount of floating-point operations such as addition, subtraction, multiplication, and division actually performed by the model during training or inference.
[0141] Theoretical peak performance of the computing system (unit can be TFLOPS): The theoretical value calculated from the characteristics of the chip hardware itself, such as the clock frequency and number of cores, and provided by the corresponding chip manufacturer; it can be based on the theoretical peak performance of single-precision floating point.
[0142] Throughput: The number of samples processed per unit of time. It has different meanings in different scenarios. In large language models, throughput is expressed as tokens / s (Tokens per Second), while in large image models, throughput is expressed as images / s (Images per Second). The meaning of throughput in other scenarios will not be elaborated here.
[0143] Figure 1 is a flowchart of a simulation efficiency testing method provided in an embodiment of this application.
[0144] As shown in Figure 1, the simulation efficiency testing method provided in this application includes:
[0145] S101: Obtain the dataset;
[0146] S102: Control the device under test to run the software framework under test to call the model under test and dataset to perform model calculation tasks;
[0147] S103: Determine the model performance parameters, model data throughput parameters, and computational performance parameters based on the model computation task;
[0148] S104: Determine the computational efficiency of the computing system of the tested model based on the model performance parameters, model data throughput parameters, and computational performance parameters, so as to obtain the test results of the computing system of the tested model.
[0149] The tested model computing system includes the tested model, the tested software framework, and the tested equipment.
[0150] To address the lack of evaluation schemes for model computational efficiency in related technologies, this application proposes an evaluation index for model computational efficiency. The computational efficiency provided in this application can serve as a unified metric for model computational systems employing different models, software frameworks, and devices, enhancing the comparability of model computations performed by different devices and providing guidance for the improvement and refinement of models, software frameworks, and hardware.
[0151] Once the tested model, the tested software framework, and the tested device included in the tested model computing system are determined, the model computing efficiency testing method provided in the embodiments of this application can be executed.
[0152] The simulation efficiency testing method provided in this application embodiment can be implemented in software by designing and deploying test scripts, and in hardware by executing based on the device under test or another test device.
[0153] When performing tests on the computational system of the model under test, the test environment requirements may include:
[0154] The ambient temperature was (25±5)℃ (Degree Celsius), the relative humidity was 45%~75%, and the atmospheric pressure was 86kPa~106kPa (kilopascal).
[0155] For devices under test with a nominal power of less than 1.5 kW (kiloWatt), the test power supply should be AC (220±1%) V (Volt); for devices under test with a nominal power of more than 1.5 kW, the test power supply should be AC (220±4%) V.
[0156] For devices under test with a nominal power of less than 1.5kW, the total harmonic distortion of the test power supply should not exceed 2%; for devices under test with a nominal power of more than 1.5kW, the total harmonic distortion of the test power supply should not exceed 5%.
[0157] The power input frequency of all tested devices should be (50±1.0%) Hz (Hertz).
[0158] The testing environment should avoid interference factors such as strong magnetic fields and strong vibrations.
[0159] Requirements for the device being tested may include:
[0160] The device under test must include a complete set of components, including a central processing unit (CPU), memory, hard drive, and operating system.
[0161] The device under test should use the standard power supply, and all power supplies should be connected to AC power during the test.
[0162] The device under test should have all software options set to default. Power management technology and / or power saving features should only be tested with the server under test enabled by default.
[0163] The device under test should be tested using the operating system stated by the manufacturer.
[0164] It should be noted that the tested model in this application embodiment can be any type of model. From the perspectives of function, structure, and application domain, it can include, but is not limited to, supervised learning models (classification models, regression models, etc.), unsupervised learning models (clustering models, dimensionality reduction models, association rule learning models, etc.), reinforcement learning models (policy learning models, value learning models, policy optimization models, etc.), semi-supervised learning models (models trained by combining labeled and unlabeled data, such as self-trained models, etc.), ensemble learning models (combining multiple models to improve performance, such as Bagging (Bootstrap Aggregating), Boosting, Stacking (Stacked Generalization), etc.), deep learning models (convolutional neural networks, recurrent neural networks, long short-term memory networks, gated recurrent units, Transformer models, etc.), generative models (generative adversarial networks, variational autoencoders, etc.), interpretable models (decision trees, rule engines, etc.), probabilistic models (Bayesian networks, hidden Markov models, etc.), meta-learning models, etc.
[0165] The software framework under test in this application embodiment is software used to call the model under test to perform model calculation tasks based on the dataset.
[0166] In the test model computing system of this application embodiment, the number of test devices can be one or more. When there are multiple test devices, the different test devices can be devices of the same type or heterogeneous computing devices. In some embodiments of this application, the test device can be a server, which can be an artificial intelligence bare metal server (computing power base).
[0167] In the embodiments of this application, the model computation task can be either a model training task or a model inference task.
[0168] The model training task involves adjusting the model's parameters through optimization algorithms to enable it to accurately learn knowledge from the dataset and thus accurately process unknown data. In this embodiment, when there are multiple devices under test, the model training task can be either synchronous or asynchronous. In synchronous training, all devices under test simultaneously read the model parameters and perform calculations using the same parameters. After each iteration, all devices under test synchronously update their model parameters. In asynchronous training, different devices under test independently read the model parameters and perform calculations without waiting for other devices to complete; updates to the model parameters are shared asynchronously with other devices under test.
[0169] The model inference task involves processing unknown data using the tested model (a trained model). In this embodiment, when there are multiple tested devices, the model inference task can be a synchronous inference task or an asynchronous inference task. A synchronous inference task involves the tested device acquiring the next set of data to be processed only after each model inference is completed. In synchronous inference, at any given time, only one inference request is executing; other inference requests must wait for the current request to complete before starting. Synchronous inference is typically deployed in a linear blocking manner; if an inference task is currently executing, other inference tasks must wait for that task to finish before they can execute. Asynchronous inference tasks allow inference tasks to execute in parallel with other threaded tasks, thereby improving resource utilization. In asynchronous inference, while one set of data to be processed is being inferred by the model, the program immediately acquires the next set of data to be processed for preprocessing, and then waits for the previous inference calculation to complete before re-inferring the data to the model.
[0170] In this embodiment, the required dataset is determined based on at least one of the tested model, the tested software framework, the tested device, and the model computation task. In practical applications, the same model may be capable of performing multiple application tasks. For example, a language model can be used to perform problem-solving tasks, code generation tasks, language understanding tasks, etc., and different application tasks correspond to different datasets. Therefore, in S101, the acquired dataset can be the dataset corresponding to the application task applicable to the tested model. Furthermore, depending on the data type used in the model computation task and the type of data to be processed, one application task can correspond to one or more datasets. For example, an image processing task can correspond to a cat image dataset and a dog image dataset. One application task can correspond to datasets of different data types, such as a dataset of single-precision floating-point data and a dataset of integer data.
[0171] In S102, the test script described above is used to control the device under test to run the software framework under test in order to call the model under test and the dataset to perform model calculation tasks.
[0172] After the tested model computing system completes the model computing task, the computing efficiency of the tested model computing system is calculated.
[0173] The simulation efficiency evaluation index proposed in this application aims to evaluate the performance of a model computing system from both the model itself and the hardware / software framework perspectives. It serves as a comparability indicator for various combinations of model computing systems, evaluating performance from the perspective of computational power, accuracy, and efficiency. Therefore, this index can be further expressed as follows: the higher the model accuracy, the faster the execution speed, and the less resources consumed in the model computing task, the higher the simulation efficiency. To this end, in this application embodiment, the simulation efficiency of the tested model computing system is determined based on its model performance parameters, model data throughput parameters, and computational performance parameters.
[0174] In this embodiment, model performance parameters are used to quantitatively represent the performance of the model under test, serving as an evaluation of the model itself. The performance of the model under test can be represented from the perspectives of result accuracy, result reliability, and model stability. In practical applications, the type of model performance parameters is selected based on the requirements of the model's computational task and the characteristics of the data to be processed by the model under test.
[0175] In some embodiments of this application, model performance parameters may include at least one of model accuracy, model resource consumption, and model computation speed. In some embodiments of this application, model accuracy may be used as the model performance parameter.
[0176] If model accuracy is used, it can include, but is not limited to, accuracy, precision, recall, F1 score, receiver operating characteristic (ROC) curve and area under the curve (AUC), mean square error (MSE), root mean square error (RMSE), mean absolute error (MAE), logarithmic loss (Log Loss), average precision (AP) and mean average precision (mAP), and coefficient of determination (R-squared). 2 At least one of the following: . These model performance parameters can be collectively referred to as model accuracy. In general, model accuracy is a key indicator for measuring model performance, representing how correctly the model processes data (such as classification or prediction).
[0177] Accuracy is primarily used to measure classification tasks. For binary classification problems, accuracy can be simply expressed as the number of correctly classified samples divided by the total number of samples.
[0178] Precision measures the proportion of samples that the model predicts as positive, but which are actually positive. It is calculated by dividing the number of samples that are actually positive (True Positives) by the number of samples that the model predicts as positive. The latter includes the sum of the number of samples that are actually positive (True Positives) and the number of samples that the model incorrectly predicts as positive (False Positives).
[0179] Recall measures the proportion of samples that are actually positive but are correctly predicted as positive by the model. It is calculated by dividing the number of samples correctly predicted as positive (True Positives) by the number of samples that are actually positive. The latter is the sum of the number of samples correctly predicted as positive (True Positives) and the number of samples incorrectly predicted as negative (False Negatives).
[0180] The F1 score is used to comprehensively consider the performance of precision and recall. It can be expressed as the harmonic mean of precision and recall, and is calculated as 2 × precision × recall / (precision + recall).
[0181] The ROC curve is a graphical tool used to evaluate the performance of classification models. It visually demonstrates a model's classification performance by depicting the relationship between the True Positive Rate (TPR) and False Positive Rate (FPR) at different classification thresholds. When evaluating model performance, the area under the ROC curve (AUC) can serve as a comprehensive indicator, allowing for direct comparison of the performance of multiple models using a single value. AUC values range from 0.5 to 1.0, with a higher AUC indicating better performance.
[0182] Mean squared error (MSE) is mainly used to measure the difference between the model's predicted values and the true values in regression tasks. The calculation formula is: Among them, y i For the true value, is the predicted value, and n is the number of samples.
[0183] The root mean square error (RMSE) is the square root of the mean square error, providing the magnitude of the error in the same units as the original data. The calculation formula is:
[0184] Mean Absolute Error (MAE) is the average of the absolute values of the differences between the model's predicted values and the actual values. Its calculation formula is: Among them, y i For the true value, is the predicted value, and n is the number of samples.
[0185] Log loss, also known as logistic loss or cross-entropy loss, is a loss function used in machine learning to evaluate the performance of classification models. It is particularly suitable for binary classification problems, but can also be extended to multi-class classification problems. Log loss measures the difference between the model's predicted probabilities and the actual labels.
[0186] Precision (AP) and mean average precision (mAP) are important metrics for evaluating the performance of classification and object detection models, especially in information retrieval, image classification, and object detection. AP measures performance on each class. Specifically, AP is the area under the precision-recall curve. To calculate AP, precision and recall values are calculated independently for each class, then the precision-recall curve is plotted, and the area under the curve is calculated to obtain the AP value for that class. mAP is the average of AP values across multiple classes, taking into account the performance of all classes. In object detection tasks, mAP is a key metric for evaluating the model's performance across all classes. It is calculated by calculating the AP value for each class separately and then averaging them to obtain the final mAP value.
[0187] Coefficient of determination (R) 2 R0, also known as the coefficient of determination or goodness of fit, is a statistic used in regression analysis to assess how well a model fits. It represents the proportion of variance of the dependent variable explained by the model, essentially the squared correlation between the model's predicted and actual values. 2 The value is between 0 and 1. The closer the value is to 1, the better the model fits, meaning the higher the proportion of variance explained by the model.
[0188] In the embodiments of this application, one or more model accuracies can be used as model performance parameters, and the combination of different types of model accuracies can be designed according to the type of model computation task.
[0189] In the embodiments of this application, the model data throughput parameter is used to represent the ability of the tested model to process input data, and is an evaluation index of the overall computing system of the tested model. In some embodiments of this application, an effective throughput evaluation parameter can be defined to quantify the ability of the tested model computing system to effectively process input data. In some embodiments of this application, effective processing of input data can be defined as the case where the tested model processes the input data to produce a correct result, such as the case where the prediction result is correct in prediction calculation.
[0190] S103, determining the computational efficiency of the tested model computing system based on model performance parameters, model data throughput parameters, and computational performance parameters, may include: determining the effective throughput evaluation parameter of the tested model computing system based on the model performance parameters and model data throughput parameters; and determining the computational efficiency based on the effective throughput evaluation parameter and computational performance parameters. In some embodiments of this application, the higher the effective throughput corresponding to the effective throughput evaluation parameter, the higher the computational efficiency.
[0191] In the embodiments of this application, the calculated performance parameters are used to quantify the performance of the hardware and software framework consisting of the software framework under test and the device under test.
[0192] In some embodiments of this application, the step of determining computational performance parameters may include: determining the computational performance parameters based on the measured computational performance parameters of the computational system performing model computation tasks and the theoretical peak performance parameters of the computational system of the device under test. In this embodiment, computational efficiency can be defined to represent the hardware performance utilization rate of the device under test when running the software framework under test to perform model computation tasks based on the model under test and the dataset. The computational efficiency can be the ratio of the measured computational performance parameters to the theoretical peak performance parameters of the computational system, or calculated as a function of this ratio.
[0193] As described above, after the test model computing system completes the model computing task, the true efficiency of the test model under the test equipment and the test software framework can be reflected by determining the computing efficiency of the test model computing system, thereby realizing the comprehensive performance evaluation of the test model itself, the test software framework and the test equipment.
[0194] In S103, after the model calculation task is completed, the model performance parameters, model data throughput parameters, and calculation performance parameters can be determined according to the execution process or execution result of the model calculation task. A callback function can be deployed on the device under test to make the device under test return the relevant results of the model calculation task so as to calculate the model performance parameters, model data throughput parameters, and calculation performance parameters.
[0195] In S104, the computational efficiency of the tested model computing system is calculated based on the model performance parameters, model data throughput parameters, and computational performance parameters.
[0196] Then, the simulation efficiency can be directly used as the test result of the tested model computing system, or the simulation efficiency can be used as one of the indicators of the test result.
[0197] The simulation efficiency testing method provided in this application has the advantage of determining model performance efficiency based on the following: After controlling the software framework under test to call the model under test to execute model calculation tasks on the device under test according to the dataset, the model performance parameters, model data throughput parameters, and computational performance parameters are determined based on the model calculation tasks. The simulation efficiency of the tested model calculation system is then determined based on these parameters, allowing for a quantitative evaluation of the effect of increased computing power on model performance. This enables a quantitative assessment of the model calculation performance of the tested model calculation system, including the tested model, the tested software framework, and the tested device. The simulation efficiency and testing method provided in this application, as an evaluation scheme applicable to various model calculation tasks, enhance the comparability of different models running on different devices, providing a reference for the improvement and refinement of models and model calculation systems. Furthermore, it allows for a quantitative evaluation of the accuracy and multi-dimensional computing power performance of different models, thereby ensuring effective guidance for improving model capabilities and reducing application costs.
[0198] Based on the above embodiments, this application further describes the calculation steps for model parameter efficiency.
[0199] As described in the above embodiments, parameters calculated based on model performance parameters and model data throughput parameters can be defined as effective throughput evaluation parameters. The definition of effective throughput evaluation parameters can reflect the correctness of the tested model's processing result on the input data. That is, determining the computational efficiency of the tested model's computing system based on model performance parameters, model data throughput parameters, and computational performance parameters in S103 can include: determining the effective throughput evaluation parameters of the tested model's computing system based on model performance parameters and model data throughput parameters; and determining the computational efficiency based on the effective throughput evaluation parameters and computational performance parameters. Furthermore, the larger the effective throughput corresponding to the effective throughput evaluation parameter, the greater the computational efficiency.
[0200] Before calculating the effective throughput evaluation parameters, it is necessary to determine the model performance parameters and the model data throughput parameters.
[0201] Since models with excessively low performance cannot be practically applied and have no practical value, calculating their computational efficiency is meaningless. Therefore, in this embodiment, a baseline model performance parameter range can be defined to measure the minimum standard of the tested model's performance. In this embodiment, determining the computational efficiency of the tested model's computational system based on model performance parameters, model data throughput parameters, and computational performance parameters in step S104 can include: if the model performance parameters are within the baseline model performance parameter range, then determining the computational efficiency of the tested model's computational system based on the model performance parameters, model data throughput parameters, and computational performance parameters; if the model performance parameters are not within the baseline model performance parameter range, then setting the model performance parameters to 0 and then determining the computational efficiency of the tested model's computational system based on the model performance parameters, model data throughput parameters, and computational performance parameters.
[0202] If the model performance parameter is the model accuracy, then the baseline model performance parameter is the baseline accuracy. In step S104, the computational efficiency of the tested model's computational system is determined based on the model performance parameter, model data throughput parameter, and computational performance parameter. This includes: if the model performance parameter is within the range of the baseline model performance parameter, then the computational efficiency is determined based on the model performance parameter, model data throughput parameter, and computational performance parameter; if the model performance parameter is not within the range of the baseline model performance parameter, then the model performance parameter is set to 0, and the computational efficiency is determined based on the model performance parameter, model data throughput parameter, and computational performance parameter. This process can be represented as:
[0203] Among them, the measured accuracy of the model is the measured accuracy of the tested model in the model calculation task, and the baseline accuracy is the minimum allowable accuracy of the tested model.
[0204] In the embodiments of this application, different ranges of benchmark model performance parameters can be set for different tested models, different model computation tasks, or different datasets.
[0205] In some embodiments of this application, the effective throughput evaluation parameter can be used to quantify the "quantity" by which the tested model processes input data correctly. The model data throughput parameter can include the computational cost per unit of model data; the computational cost per unit of model data is the computational cost required by the tested model to process a unit of input data; the better the model performance corresponding to the model performance parameter, the larger the effective throughput evaluation parameter; the effective throughput evaluation parameter is negatively correlated with the computational cost per unit of model data. As described in the above embodiments of this application, the model performance parameter can adopt model accuracy. Therefore, in the tested model computing system, the larger the amount of input data that the tested model can correctly process, the larger the effective throughput evaluation parameter and the greater the model computation efficiency.
[0206] In this embodiment, the unit model data computation cost is a measured value, i.e., the measured unit model data computation cost. The type of input data for the tested model is determined based on the type of the tested model or the dataset type, thus determining the unit of the unit model data computation cost. For example, if the tested model is a language model, the unit model data computation cost can be the computational cost required for the tested model to process each input token. If the tested model is an image processing model, the unit model data computation cost can be the computational cost required for the tested model to process each input image.
[0207] In some other embodiments of this application, the effective throughput evaluation parameter can be used to quantify the "rate" at which the tested model processes input data correctly. The model data throughput parameter can include throughput rate; throughput rate is the rate at which the tested model processes input data per unit time; the better the model performance corresponding to the model performance parameter, the larger the effective throughput evaluation parameter; the effective throughput evaluation parameter is positively correlated with throughput rate. Similarly, taking model accuracy as an example of a model performance parameter, in the tested model computing system, the greater the throughput rate at which the tested model can correctly process input data, the larger the effective throughput evaluation parameter and the greater the computational efficiency.
[0208] In this embodiment, throughput is a measured value, that is, the amount of input data processed by the tested model per unit time. The type of input data for the tested model is determined based on the type of the tested model or the type of the dataset, thus determining the unit of throughput. For example, if the tested model is a language model, throughput can be the number of input tokens processed by the tested model per unit time, and the unit of throughput can be tokens / s (i.e., the number of input tokens processed per second). If the tested model is an image processing model, throughput can be the number of input images processed by the tested model per unit time, and the unit of throughput can be images / s (i.e., the number of input images processed per second).
[0209] Based on the above embodiments, this application further describes the calculation steps for the performance parameters.
[0210] As described in the above embodiments, the computational performance parameters can be determined based on the measured computational performance parameters of the computational system of the tested model performing the model computation task and the theoretical peak performance parameters of the computational system of the tested device.
[0211] In an implementation method for determining computational performance parameters based on the measured computational performance parameters of the computational system performing model computation tasks and the theoretical peak performance parameters of the computational system of the device under test, the ratio of the measured computational performance parameters to the theoretical peak performance parameters of the computational system can be used as the computational performance parameter. Computational efficiency can be defined to represent the hardware performance utilization rate of the device under test when performing model computation tasks based on the model under test and the dataset within the framework of the software under test. Therefore, computational efficiency can be set to be positively correlated with the measured computational performance parameters of the computational system performing model computation tasks, and negatively correlated with the theoretical peak performance parameters of the computational system of the device under test; the computational performance corresponding to the computational performance parameter is positively correlated with the computational efficiency. In some embodiments of this application, computational efficiency can be the ratio of the measured computational performance parameters to the theoretical peak performance parameters of the computational system, or calculated as a function of this ratio. In embodiments of this application, the step of determining the measured computational performance parameters may include: obtaining the measured total computational load and the measured total time consumed by the computational system performing model computation tasks; and determining the measured computational performance parameters based on the measured total computational load and the measured total time consumed.
[0212] In this embodiment, for the device under test, the theoretical peak performance parameters of the computing system are used to quantify the theoretical peak computing power of the device under test, while the measured computing performance parameters are used to quantify the actual computing power invested by the device under test when executing model computing tasks. For the model under test, the measured computing performance parameters are used to quantify the amount of computation per unit time during the process of the software framework under test calling the model under test to execute model computing tasks.
[0213] In this embodiment, the theoretical peak performance parameters of the computing system can be parameters provided by the manufacturer of the device under test. These parameters are typically theoretical values calculated based on the hardware configuration of the device under test (such as the frequency and number of cores of the central processing unit). The measured computing performance parameters are determined based on the execution process or results of the model computing task and are parameters actually obtained through testing.
[0214] In some embodiments of this application, the measured computational performance parameters can be expressed as the number of floating-point operations per second (FLOPS) performed by the device under test (DUT) to execute model computation tasks, and the theoretical peak performance parameters of the computational system can be expressed as the theoretical peak value of the number of FLOPS performed by the DUT. Here, FLOPS refers to the number of floating-point operations (addition, subtraction, multiplication, division, etc.) that a system or device can perform per unit time. In other embodiments of this application, the measured computational performance parameters can also be expressed as the number of trillions of floating-point operations per second (TFLOPS) performed by the DUT to execute model computation tasks, and the theoretical peak performance parameters of the computational system can also be expressed as the theoretical peak value of the number of trillions of floating-point operations per second performed by the DUT.
[0215] Actual testing revealed that the data type used during model computation affects both model performance parameters and computational performance parameters. Therefore, data type must be considered when determining the theoretical peak performance parameters and measured computational performance parameters of the computing system.
[0216] In some embodiments of this application, the steps for determining the measured computational performance parameters and the theoretical peak performance parameters of the computational system may include: obtaining the theoretical peak performance parameters of the computational system corresponding to the first data type of the device under test; if all actual data types in the model computation task are of the first data type, then the computational performance parameters corresponding to the actual data types in the model computation task are used as the measured computational performance parameters; if there are actual data types in the model computation task that are not of the first data type, then the computational performance parameters corresponding to the actual data types in the model computation task are equivalent to the computational performance parameters corresponding to the first data type, thus obtaining the measured computational performance parameters. That is, regardless of the actual data type used in the model computation task, it is treated as the first data type to determine the measured computational performance parameters, and then combined with the theoretical peak performance parameters of the computational system to determine the computational performance parameters. This greatly simplifies the steps for determining the model computation efficiency, thereby quickly obtaining the test results.
[0217] The first data type can be selected from the theoretical peak performance parameters of the computing system for various data types provided by the manufacturer of the device under test. In practical applications, the first data type can be, but is not limited to, Integer, Floating-point, Character, String, or Array. If the first data type is Floating-point, it can specifically be one of Single Precision Floating Point, Double Precision Floating Point, Extended Precision Floating Point, Quadruple Precision Floating Point, or Half Precision Floating Point.
[0218] In some embodiments of this application, the first data type may be a single-precision floating-point type.
[0219] In some other embodiments of this application, the steps for determining the measured computational performance parameters and the steps for determining the theoretical peak performance parameters of the computational system further include: obtaining the theoretical peak performance parameters of the computational system corresponding to the second data type of the device under test; if all actual data types in the model computation task are the second data type, then the computational performance parameters corresponding to the actual data types in the model computation task are taken as the measured computational performance parameters; if there are actual data types in the model computation task that are not the second data type, then the computational performance parameters corresponding to the actual data types in the model computation task are converted into computational performance parameters corresponding to the second data type to obtain the measured computational performance parameters.
[0220] The second data type can be selected from the theoretical peak performance parameters of the computing system for various data types provided by the manufacturer of the device under test. In practical applications, the first data type can be, but is not limited to, Integer, Floating-point, Character, String, or Array. If the second data type is floating-point, it can specifically be one of Single Precision Floating Point, Double Precision Floating Point, Extended Precision Floating Point, Quadruple Precision Floating Point, or Half Precision Floating Point.
[0221] In some embodiments of this application, the second data type may be a single-precision floating-point type.
[0222] Based on the above embodiments, this application provides a method for determining the computational efficiency suitable for practical applications.
[0223] In this embodiment, the computational efficiency of the tested model computing system is determined based on model performance parameters, model data throughput parameters, and computational performance parameters, and can be calculated using the following formula:
[0224] Here, model accuracy is a model performance parameter; unit model data computation is the computational amount required for the tested model to process a unit of input data; and computational efficiency is a computational performance parameter.
[0225] In this embodiment of the application, for the case where the model under test is a language model, the computational efficiency of the computing system of the model under test is determined based on the model performance parameters, model data throughput parameters, and computational performance parameters, and can be calculated using the following formula:
[0226] Among them, model accuracy is a model performance parameter; measured computation per input label is a model data throughput parameter, representing the amount of computation required by the tested model to process each input label; computational efficiency is a computational performance parameter.
[0227] In this embodiment of the application, the computational efficiency of the tested model computing system is determined based on model performance parameters, model data throughput parameters, and computational performance parameters. It can also be calculated using the following formula:
[0228] Among them, model accuracy is the model performance parameter; unit model data computation is the computational amount required for the tested model to process a unit of input data; the ratio of the measured computational performance parameter of the tested model computing system to the theoretical peak performance parameter of the tested equipment computing system represents the computational efficiency, which is the computational performance parameter.
[0229] In this embodiment of the application, the computational efficiency of the tested model computing system is determined based on model performance parameters, model data throughput parameters, and computational performance parameters. It can also be calculated using the following formula:
[0230] Among them, model accuracy is the model performance parameter; unit model data computation is the computation required for the tested model to process a unit of input data; the ratio of the measured total computation to the measured total time of the model computation task is the measured computation performance parameter of the tested model computing system executing the model computation task; the ratio of the measured computation performance parameter to the theoretical peak performance parameter of the tested device computing system represents the computation efficiency, and computation efficiency is the computation performance parameter.
[0231] In this embodiment of the application, the computational efficiency of the tested model computing system is determined based on model performance parameters, model data throughput parameters, and computational performance parameters. It can also be calculated using the following formula:
[0232] Among them, model accuracy is the model performance parameter; unit model data computation is the computation required for the tested model to process a unit of input data; the ratio of the product of unit model data computation and the total number of input data to the measured total time of the model computation task is the measured computation performance parameter of the tested model computing system executing the model computation task; the ratio of the measured computation performance parameter to the theoretical peak performance parameter of the tested device's computing system represents the computation efficiency, which is the computation performance parameter.
[0233] In this embodiment of the application, the computational efficiency of the tested model computing system is determined based on model performance parameters, model data throughput parameters, and computational performance parameters. It can also be calculated using the following formula:
[0234] Among them, model accuracy is the model performance parameter; the ratio of the total number of input data to the total measured time is the model data throughput parameter; and the theoretical peak performance parameter of the computing system is the computing performance parameter.
[0235] In this embodiment of the application, the computational efficiency of the tested model computing system is determined based on model performance parameters, model data throughput parameters, and computational performance parameters. It can also be calculated using the following formula:
[0236] Among them, model accuracy is a model performance parameter; throughput is a model data throughput parameter, representing the rate at which the tested model processes input data per unit time; and the theoretical peak performance parameter of the computing system is a computing performance parameter.
[0237] As can be seen from the calculation formula of the above-mentioned model computation efficiency provided in the embodiments of this application, the model computation efficiency expressed by model data throughput parameters and computation performance parameters in different forms can be equivalent. For example, when the model performance parameter adopts model accuracy, the computation performance parameter adopts computation efficiency, and the model data throughput parameter adopts the unit model data computation amount (that is, the effective throughput evaluation parameter introduced in the above embodiments can be used to quantify the "quantity" of the tested model's processing result of the input data as a correct result), or the computation performance parameter adopts the theoretical peak performance parameter of the computing system, and the model data throughput parameter adopts throughput rate (that is, the effective throughput evaluation parameter introduced in the above embodiments can be used to quantify the "rate" of the tested model's processing result of the input data as a correct result), the final calculated model computation efficiency is equivalent.
[0238] In practical applications, the different methods used to determine the model calculation efficiency provided in the embodiments of this application affect the type of test parameters required to be obtained in the model calculation task in S102. For example, if the following methods are used... To determine the efficiency of model computation, it is necessary to test the model accuracy, computational efficiency, and computational cost per unit of model data from model computation tasks; if using To determine the model computation efficiency, it is necessary to test the model accuracy, throughput, and theoretical peak performance parameters of the computing system from the model computation task.
[0239] In some embodiments of this application, the measured computational performance parameters can be the number of floating-point operations per second (FLOPS) or the number of trillion floating-point operations per second (TFLOPS) performed by the test model computing system to execute model computing tasks, that is, the amount of computation performed by the test model in a unit of time in model computing (inference computing or training computing) such as addition, subtraction, multiplication and division.
[0240] The theoretical peak performance parameters of the computing system can be the theoretical values calculated based on the hardware characteristics provided by the manufacturer of the device under test. The unit can be the theoretical peak performance parameter of single-precision floating-point, that is, floating-point operations per second (FLOPS) or trillion floating-point operations per second (TFLOPS).
[0241] The total measured time is the measured time of the model computation task. The timing starts when the model computation task (inference computation or training computation) begins and stops when the model computation task ends. The unit can be seconds (s).
[0242] Throughput is the rate at which the tested model processes input data per unit of time. When the tested model is a language model, throughput can be expressed as the number of input tokens processed per unit of time, and the unit of throughput can be tokens / s (i.e., the number of input tokens processed per second). When the tested model is an image processing model, throughput can be expressed as the number of input images processed per unit of time, and the unit of throughput can be images / s (i.e., the number of input images processed per second). This principle can be used to determine the unit of throughput when the tested model is another type of model, or when the dataset is a different dataset.
[0243] In practical applications, observations have revealed that, besides the tested device and software framework, factors affecting model performance include at least the number of model parameters, the type of application scenario, the type of dataset, and the data type used for model computation. That is, under different combinations of these factors, the same tested model may exhibit different performance characteristics, further impacting the final measured computational efficiency. Therefore, in this embodiment, the computational efficiency of the tested model's computational system can be more comprehensively evaluated by determining its overall computational efficiency.
[0244] In some embodiments of this application, determining the computational efficiency of the tested model computing system in step S104 based on model performance parameters, model data throughput parameters, and computational performance parameters to obtain the test result of the tested model computing system may include: determining the comprehensive computational efficiency of the tested model computing system based on its computational efficiency in multiple test scenarios; using the comprehensive computational efficiency as the test result; wherein, each test scenario corresponds one-to-one with a dataset. That is, multiple test scenarios can be considered, and a dataset can be selected for each test scenario to perform model computation tasks, and then the comprehensive computational efficiency of each test scenario can be obtained.
[0245] In some other embodiments of this application, S104, determining the computational efficiency of the tested model computing system based on model performance parameters, model data throughput parameters, and computational performance parameters to obtain the test results of the tested model computing system, may further include: determining the comprehensive computational efficiency of the tested model computing system based on the computational efficiency of the tested model computing system in multiple test scenarios; using the comprehensive computational efficiency as the test result; wherein, one test scenario corresponds to one or more datasets. That is, for multiple test scenarios, one or more datasets can be selected for each test scenario to perform model computing tasks. For test scenarios using multiple datasets, the computational efficiency of each dataset is first combined to obtain the computational efficiency corresponding to the test scenario, and then the computational efficiency of each test scenario is combined to obtain the comprehensive computational efficiency.
[0246] Taking a language model as an example, the test scenario can be set as follows:
[0247] Test Scenario 1: The model calculation task is a problem-solving task, and the dataset used may include, but is not limited to, GSM8K (Grade School Math 8K), Math (Mathematics Set), and Math23K (Mathematics 23K).
[0248] Test Scenario 2: The model computation task is a code generation task, and the dataset used may include, but is not limited to, Human Eval (Human Evaluation of Programming Problems Benchmark) and MBPP (Multi-language Benchmark for Programming Problems).
[0249] Test Scenario 3: The model computation task is a large-scale multi-task language understanding task, and the dataset used may include, but is not limited to, MMLU.
[0250] Test Scenario 4: The model computation task is a reading comprehension task, and the dataset used may include, but is not limited to, ARC-C (AI2 Reasoning Challenge-Challenge Set).
[0251] The steps for determining the computational efficiency corresponding to a single test scenario may include: if the test scenario corresponds to one dataset, then the computational efficiency corresponding to the dataset is used as the computational efficiency corresponding to the test scenario; if the test scenario corresponds to multiple datasets, then the computational efficiency corresponding to the test scenario is determined based on the computational efficiency corresponding to each dataset. In some embodiments of this application, determining the computational efficiency corresponding to the test scenario based on the computational efficiency corresponding to each dataset may include: using the arithmetic mean of the computational efficiencies corresponding to each dataset as the computational efficiency corresponding to the test scenario. In other embodiments of this application, determining the computational efficiency corresponding to the test scenario based on the computational efficiency corresponding to each dataset may further include: using the geometric mean of the computational efficiencies corresponding to each dataset as the computational efficiency corresponding to the test scenario.
[0252] In some further embodiments of this application, model performance parameters, model data throughput parameters, and computational performance parameters can be calculated separately. Therefore, determining the computational efficiency of a test scenario based on the computational efficiency of each dataset corresponding to the test scenario in step S104 can further include: determining the single-scene model performance parameters of the test scenario based on the model performance parameters of each dataset corresponding to the test scenario; determining the single-scene model data throughput parameters of the test scenario based on the model data throughput parameters of each dataset corresponding to the test scenario; determining the single-scene computational performance parameters of the test scenario based on the computational performance parameters of each dataset corresponding to the test scenario; and determining the computational efficiency of the test scenario based on the single-scene model performance parameters, single-scene model data throughput parameters, and single-scene computational performance parameters. That is, by first calculating the single-scene model performance parameters, single-scene model data throughput parameters, and single-scene computational performance parameters separately, different comprehensive calculation methods can be used for these three types of parameters corresponding to the datasets, so as to better reflect the impact of different types of parameters on the computational efficiency calculation.
[0253] In some embodiments of this application, determining the single-scene model performance parameters corresponding to the test scenario based on the model performance parameters corresponding to each dataset can include: using the arithmetic mean of the model performance parameters corresponding to each dataset as the single-scene model performance parameter. Determining the single-scene model data throughput parameters corresponding to the test scenario based on the model data throughput parameters corresponding to each dataset can include: using the arithmetic mean of the model data throughput parameters corresponding to each dataset as the single-scene model data throughput parameter. Determining the single-scene computational performance parameters corresponding to the test scenario based on the computational performance parameters corresponding to each dataset can include: using the arithmetic mean of the computational performance parameters corresponding to each dataset as the single-scene computational performance parameter.
[0254] In other embodiments of this application, determining the single-scene model performance parameters corresponding to the test scenario based on the model performance parameters corresponding to each dataset can include: using the geometric mean of the model performance parameters corresponding to each dataset as the single-scene model performance parameter. Determining the single-scene model data throughput parameters corresponding to the test scenario based on the model data throughput parameters corresponding to each dataset can include: using the geometric mean of the model data throughput parameters corresponding to each dataset as the single-scene model data throughput parameter. Determining the single-scene computational performance parameters corresponding to the test scenario based on the computational performance parameters corresponding to each dataset can include: using the geometric mean of the computational performance parameters corresponding to each dataset as the single-scene computational performance parameter.
[0255] In this embodiment of the application, determining the comprehensive computational efficiency of the tested model computing system based on its computational efficiency in multiple test scenarios may include using the arithmetic mean of the computational efficiencies corresponding to multiple test scenarios as the comprehensive computational efficiency. The comprehensive computational efficiency can then be calculated using the following formula:
[0256] Among them, MCE t MCE represents the overall computational efficiency. i f represents the computational efficiency of the tested model computing system in the i-th test scenario. 1i This represents the first weight coefficient of the tested model calculation system in the i-th test scenario. In the embodiments of this application, the first weight coefficients corresponding to each test scenario may be equal or unequal.
[0257] In this embodiment of the application, the comprehensive computational efficiency of the tested model computing system is determined based on its computational efficiency in multiple test scenarios. This may further include using the geometric mean of the computational efficiencies across multiple test scenarios as the comprehensive computational efficiency. The comprehensive computational efficiency can then be calculated using the following formula:
[0258] Among them, MCE t MCE represents the overall computational efficiency. i The Π(·) represents the computational efficiency of the tested model computation system in the i-th test scenario, where f represents the cumulative multiplication calculation. 2i This represents the second weight coefficient of the tested model calculation system in the i-th test scenario. In the embodiments of this application, the second weight coefficients corresponding to each test scenario may be equal or unequal.
[0259] In this application, two methods are proposed for determining the combined modular computation efficiency: the arithmetic mean and the geometric mean. In practical applications, the geometric mean is more suitable for comparing data of different scales or units than the arithmetic mean. Therefore, in some embodiments of this application, if the difference between at least two modular computation efficiencies corresponding to the tested model computation system exceeds a preset threshold, the geometric mean of the modular computation efficiencies is used as the comprehensive modular computation efficiency. This mitigates the possibility of test errors or the tested model computation system performing excessively well or poorly in certain test scenarios or datasets, thereby obtaining a more comprehensive comprehensive modular computation efficiency.
[0260] Based on the above embodiments, this application further describes the steps for controlling the test model computing system to perform model computing tasks.
[0261] In this embodiment of the application, S102, which controls the device under test to run the software framework under test to call the model under test and the dataset to perform model calculation tasks, may include: loading the software framework under test on the device under test and performing initialization of the software framework under test; starting the software framework under test to call the model under test and the dataset to perform model calculation tasks.
[0262] After the System Under Test (SUT) is deployed, it is provided as a Dynamic Link Library (DLL). In addition, it must provide the necessary interfaces and corresponding implementation classes. When the test script starts, it dynamically loads the SUT's library files. The Lib library provides access interfaces; if the software framework under test uses C or C++, the Lib library provides a C interface (CreateSUTInstance) to provide services. The test script obtains a class instance of the SUT by calling this interface. It should be noted that this instance must be derived from the base class (BaseSUT). The test script then starts the inference test by calling the instance's member functions. After the test is completed, the results are returned to the test framework, which saves these results and performs subsequent calculations on the model's computational efficiency. Finally, the test script integrates all the information to generate a detailed test report.
[0263] In this embodiment of the application, loading the software framework under test on the device under test and performing the initialization of the software framework under test may include: loading the library file of the computing system under test on the device under test; obtaining the type of the interface function of the computing system under test through the calling interface of the library file; and initializing the interface function according to the type of the interface function.
[0264] The calling interface can be a C-style interface (CreateSUTInstance), which returns a pointer to an instance of ConcreteSUT (instantiating the computation system of the model under test). After obtaining this pointer, the test script will cast it to a pointer to the BaseSUT base class for subsequent operations and use. This design allows the testing framework to be decoupled from specific implementation details, enhancing the framework's versatility and flexibility. The calling interface can be defined as follows:
[0265] Interface.
[0266] void*CreateSUTInstance().
[0267] Therefore, in this embodiment, the model computation system under test must implement a custom ConcreteSUT class, which inherits from the BaseSUT base class and overrides all abstract methods defined in the base class. This design ensures that the model computation system under test can provide the necessary functions and behaviors according to the requirements of the testing framework. In this way, the ConcreteSUT class encapsulates the specific implementation details while ensuring compatibility with the test scripts and consistency of the interface.
[0268] In this embodiment, the types of interface functions may include, but are not limited to: initialization function (Initialize), finalization function (Finalize), data loading function (LoadData), data unloading function (UnLoadData), model computation control function, and test parameter return function. The initialization function is an interface function that performs initialization based on test configuration information; the finalization function is an interface function that performs tasks after the tested model computation system has completed its model computation task; the data loading function is an interface function that loads model computation data; the data unloading function is an interface function that unloads model computation data; the model computation control function is an interface function that performs model computation tasks; and the test parameter return function outputs test parameters required for calculating at least one of the following: model performance parameters, model data throughput parameters, and computation performance parameters.
[0269] The model computation control function is determined based on the type of model computation task, and the type description can be found in the above embodiment. Taking the model computation task as a model inference task as an example, the model computation control function may include an inference control function (Inferencet) and a data result callback function (Response). The inference control function (Inferencet) is used to control the tested model computation system to traverse each piece of data in the test dataset for inference computation. The data result callback function (Response) is used to control the tested model computation system to call the callback function of the test script in a specified format after the inference computation of each piece of data is completed. The test script saves the test data results and records the test completion time after receiving the inference completion information.
[0270] The test parameter return function is determined based on the selected method for calculating the simulation efficiency, which can be found in the description of the above embodiment. Taking the determination of simulation efficiency as an example, the test parameter return function can include a throughput return function (GetThroughput) and a system theoretical peak return function (GetSystemFLOPS). These functions are used to control the test script of the tested model to provide its runtime throughput data and to return the theoretical peak performance parameters of the tested device's computing system, so as to meet the needs of simulation efficiency calculation.
[0271] Interface functions may also include uninitialize functions, which are used by the tested model computing system to release all occupied resources after the test is completed, ensuring that the system can be restored to the state before the test.
[0272] In this embodiment of the application, starting the software framework under test to call the model under test to perform a model calculation task may include: starting the software framework under test to perform a model calculation task; obtaining the first result returned by the test parameter return function after the software framework under test calls the data loading function to load the model calculation data; obtaining the second result of the test parameter return function after the software framework under test calls the model calculation control function to perform the model calculation task; waiting for the third result output by the software framework under test after it finishes performing the model calculation task; obtaining the fourth result returned by the test parameter return function after the software framework under test calls the data unloading function to unload the model calculation data; and calculating the model accuracy of the model under test based on the execution result of the model calculation task, including: calculating the model accuracy based on at least one of the first result, the second result, the third result, and the fourth result.
[0273] Figure 2 is a timing diagram of a simulation efficiency testing method provided in an embodiment of this application.
[0274] As shown in Figure 2, in some embodiments of this application, after the tested model computing system is deployed, the step of the test device running the test script to execute the model computing efficiency test method may include the following steps S201 to S218. In this timing diagram embodiment, the test script is deployed in the test device, the library files of the tested model computing system and the tested model computing system instance are deployed in the tested device, and the executed steps are regarded as steps executed by the corresponding hardware.
[0275] S201: Startup. Start the test equipment to begin testing the computational efficiency of the model under test.
[0276] S202: Load library files. The test equipment loads the library files of the computational system of the model under test.
[0277] S203: Obtain Interface Functions. The test device obtains the interface functions of the computing system of the model under test.
[0278] S204: Generate an instance of the computational system under test. Generate an instance of the computational system under test based on its interface functions.
[0279] S205: Load a new instance of the computational system for the model under test. The library files of the computational system for the model under test will be loaded into the new instance of the computational system for the model under test.
[0280] S206: Return creation result. The instance of the tested model computing system returns the creation result to the library files of the tested model computing system.
[0281] S207: Return creation result. The library files of the tested model's computation system return the creation result to the test script.
[0282] S208: Call the initialization function. The test device calls the initialization function of the computing system under test to perform the initialization of the computing system under test.
[0283] S209: Loading Data. The tested model computing system instance calls the data loading function to load the data required for the model computing task.
[0284] S210: Load Model. The test model computation system instance loads the model parameters of the test model.
[0285] S211: Other operations. Other initialization operations required for the tested model computation system instance to perform model computation tasks.
[0286] S212: Returns initialization results. The tested model computation system instance returns initialization results.
[0287] S213: Start Test. The test script controls the instance of the test model's computation system to begin executing the model computation task.
[0288] For each input data, S214 to S215 below are the steps included in one model calculation in the iterative calculation of the model calculation task.
[0289] S214: Model Calculation. The tested model calculation system instance performs model calculations based on the input data.
[0290] S215: Send calculation results. The test model's calculation system instance responds to the test script, returning the model's calculation results.
[0291] S216: Obtain the theoretical peak performance parameters of the computing system. The test script sends a request to the library files of the computing system under test to obtain the theoretical peak performance parameters of the computing system.
[0292] S217: Returns the theoretical peak performance parameters of the computing system. The library files of the computing system under test return the theoretical peak performance parameters of the computing system to the test script.
[0293] S218: Get Throughput. The test script sends a request to the library files of the test model's computation system to get the throughput.
[0294] S219: Returns throughput. The library files of the tested model's computation system return the throughput to the test script.
[0295] S220: Calculate model performance parameters. The test script calculates model performance parameters, such as model accuracy, based on the results returned by the system during the execution of model calculation tasks.
[0296] S221: Release resources. The test script controls the library files of the tested model's computing system to call the deinitialization function to release resources.
[0297] S222: Determine the computational efficiency. The test script determines the computational efficiency of the system based on the model's performance parameters, data throughput parameters, and computational performance parameters.
[0298] S223: Generate a test report. The test script generates a test report for the tested model's computational system based on test results such as simulation efficiency and overall simulation efficiency.
[0299] The simulation efficiency testing method provided in this application not only proposes evaluation indicators for simulation efficiency but also gives a method for testing simulation efficiency. This provides a unified evaluation standard applicable to various model computing systems, which can help model users select more efficient models and assist model developers in optimizing models, software frameworks, computing devices, etc., so as to promote the green development of artificial intelligence, promote the efficient use of resources, and enhance model quality.
[0300] It should be noted that in the embodiments of the simulation efficiency testing methods in this application, some steps or features may be ignored or not executed. The hardware or software functional modules are divided for ease of explanation and are not the only implementation of the simulation efficiency testing methods provided in the embodiments of this application.
[0301] The above details various embodiments of the simulation efficiency testing method. Based on this, this application also discloses a simulation efficiency testing apparatus, equipment, non-volatile readable storage medium, and computer program product corresponding to the above method.
[0302] Figure 3 is a schematic diagram of the structure of a simulation efficiency testing system provided in an embodiment of this application.
[0303] As shown in Figure 3, the simulation efficiency testing system provided in this application embodiment may include a testing device 301 and a device under test 302.
[0304] The device under test 302 is used to run the software framework under test to perform model calculation tasks based on the model under test and the dataset.
[0305] Test device 301 is used to acquire datasets and control device under test 302 to run the software framework under test to call the model under test and dataset to execute model calculation tasks; determine model performance parameters, model data throughput parameters and calculation performance parameters based on model calculation tasks; determine the model calculation efficiency of the model calculation system under test based on model performance parameters, model data throughput parameters and calculation performance parameters, so as to obtain the test results of the model calculation system under test.
[0306] The test model computing system includes the test model, the test software framework, and the test device 302.
[0307] In a specific implementation, the test device 301 sends a test control command to the device under test 302 to control the device under test 302 to execute the model test task and return the model calculation results to the test device 301. The steps for the device under test to execute the model calculation task, and the steps for the test device 301 to perform the test on the model calculation efficiency of the model calculation system under test, can be referred to the description of any of the above method embodiments of this application.
[0308] The simulation efficiency testing device provided in this application embodiment may include:
[0309] The acquisition unit is used to acquire the dataset;
[0310] The control unit is used to control the device under test to run the software framework under test in order to call the model under test and the dataset to perform model calculation tasks.
[0311] The determination unit is used to determine the model performance parameters, model data throughput parameters, and computational performance parameters based on the model computation task; and to determine the model computation efficiency of the tested model computation system based on the model performance parameters, model data throughput parameters, and computational performance parameters, so as to obtain the test results of the tested model computation system.
[0312] The tested model computing system includes the tested model, the tested software framework, and the tested equipment.
[0313] It should be noted that in the various embodiments of the simulation efficiency testing device provided in this application, the division of units is only a logical functional division, and other division methods can be used. The connection between different units can be electrical, mechanical, or other connection methods. Separate units can be located in the same physical location or distributed across multiple network nodes. Each unit can be implemented in hardware or as a software functional unit. That is, some or all of the units provided in this application can be selected according to actual needs, and corresponding connection or integration methods can be used to achieve the purpose of the solution in this application.
[0314] Since the embodiments of the apparatus and the embodiments of the method correspond to each other, please refer to the description of the embodiments of the method for the embodiments of the apparatus, which will not be repeated here.
[0315] Figure 4 is a schematic diagram of the structure of a simulation efficiency testing device provided in an embodiment of this application.
[0316] As shown in Figure 4, the simulation efficiency testing device provided in this application embodiment includes: a memory 410 for storing a computer program 411; and a processor 420 for executing the computer program 411, wherein the computer program 411, when executed by the processor 420, implements the steps of the simulation efficiency testing method provided in any of the above embodiments.
[0317] The processor 420 may include one or more processing cores, such as a 3-core processor or an 8-core processor. The processor 420 may be implemented using at least one hardware form selected from Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The processor 420 may also include a main processor and a coprocessor. The main processor, also known as the Central Processing Unit (CPU), is used to process data in the wake-up state; the coprocessor is a low-power processor used to process data in the standby state. In some embodiments, the processor 420 may integrate a Graphics Processing Unit (GPU) responsible for rendering and drawing the content to be displayed on the screen. In some embodiments, the processor 420 may also include an Artificial Intelligence (AI) processor for handling computational operations related to machine learning.
[0318] The memory 410 may include one or more non-volatile readable storage media, which may be non-transitory. The memory 410 may also include high-speed random access memory and non-volatile memory, such as one or more disk storage devices or flash memory devices. In this embodiment, the memory 410 is used to store at least the following computer program 411, wherein, after being loaded and executed by the processor 420, the computer program 411 is able to implement the relevant steps in the simulation efficiency testing method disclosed in any of the foregoing embodiments. In addition, the resources stored in the memory 410 may also include an operating system 412 and data 413, and the storage method may be temporary storage or permanent storage. The operating system 412 may be Windows or other types of operating systems. The data 413 may include, but is not limited to, the data involved in the above methods.
[0319] In some embodiments, the simulation efficiency testing device may further include a display screen 430, a power supply 440, a communication interface 450, an input / output interface 460, a sensor 470, and a communication bus 480.
[0320] Those skilled in the art will understand that the structure shown in Figure 4 does not constitute a limitation on the testing equipment for simulation efficiency, and may include more or fewer components than shown.
[0321] The simulation efficiency testing device provided in this application includes a memory and a processor. When the processor executes the program stored in the memory, it can implement the steps of the simulation efficiency testing method provided in the above embodiments, and the effect is the same as above.
[0322] This application provides a non-volatile readable storage medium storing a computer program thereon. When executed by a processor, the computer program can implement the steps of the simulation efficiency testing method provided in any of the above embodiments.
[0323] The non-volatile readable storage medium may include: USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks or optical disks, and other media that can store program code.
[0324] For a description of the non-volatile readable storage medium provided in the embodiments of this application, please refer to the above method embodiments. The effect it achieves is the same as the simulation efficiency test method provided in the embodiments of this application, and will not be repeated here.
[0325] This application provides a computer program product, including a computer program that, when executed by a processor, implements the steps of the simulation efficiency testing method provided in any of the above embodiments.
[0326] For a description of the computer program product provided in the embodiments of this application, please refer to the above method embodiments. The effects it achieves are the same as the simulation efficiency testing method provided in the embodiments of this application, and will not be repeated here.
[0327] The foregoing provides a detailed description of a simulation efficiency testing method, apparatus, device, and non-volatile readable storage medium provided in this application. The various embodiments in the specification are described in a progressive manner, with each embodiment focusing on its differences from other embodiments. Similar or identical parts between embodiments can be referred to interchangeably. For the apparatus, device, non-volatile readable storage medium, and computer program product disclosed in the embodiments, since they correspond to the methods disclosed in the embodiments, the descriptions are relatively simple, and relevant parts can be referred to in the method section. It should be noted that those skilled in the art can make several improvements and modifications to this application without departing from the principles of this application, and these improvements and modifications also fall within the protection scope of this application.
[0328] It should also be noted that, in this specification, relational terms such as "first" and "second" are used only to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes that element.
Claims
1. A method for testing computational efficiency, characterized in that, include: Obtain the dataset; Control the device under test to run the software framework under test to call the model under test and the dataset to perform model calculation tasks; Determine the model performance parameters, model data throughput parameters, and computational performance parameters based on the model computation task. The computational efficiency of the tested model computing system is determined based on the model performance parameters, the model data throughput parameters, and the computational performance parameters, so as to obtain the test results of the tested model computing system. The tested model computing system includes the tested model, the tested software framework, and the tested device.
2. The simulation efficiency testing method according to claim 1, characterized in that, The computational efficiency of the tested model computing system is determined based on the model performance parameters, the model data throughput parameters, and the computational performance parameters, including: The effective throughput evaluation parameters of the tested model computing system are determined based on the model performance parameters and the model data throughput parameters. The simulation efficiency is determined based on the effective throughput evaluation parameters and the computational performance parameters.
3. The simulation efficiency testing method according to claim 2, characterized in that, The higher the effective throughput corresponding to the effective throughput evaluation parameter, the greater the computational efficiency.
4. The simulation efficiency testing method according to claim 3, characterized in that, The model data throughput parameters include the amount of computation per unit of model data; The unit model data computation amount is the computation amount required for the tested model to process a unit of input data; The better the model performance corresponding to the model performance parameter, the larger the effective throughput evaluation parameter; the effective throughput evaluation parameter is negatively correlated with the unit model data computation amount.
5. The simulation efficiency testing method according to claim 4, characterized in that, The model under test is a language model, and the unit model data computation amount is the computation amount required for the model under test to process each input tag.
6. The simulation efficiency testing method according to claim 4, characterized in that, The model under test is an image processing model, and the unit model data computation amount is the computation amount required for the model under test to process each input image.
7. The simulation efficiency testing method according to claim 3, characterized in that, The model data throughput parameters include throughput rate; The throughput rate is the rate at which the tested model processes input data per unit time. The better the model performance corresponding to the model performance parameter, the larger the effective throughput evaluation parameter; the effective throughput evaluation parameter is positively correlated with the throughput rate.
8. The simulation efficiency testing method according to claim 7, characterized in that, The model under test is a language model, and the throughput is the number of input tags processed by the model under test per unit time.
9. The simulation efficiency testing method according to claim 7, characterized in that, The model under test is an image processing model, and the throughput rate is the number of input images processed by the model under test per unit time.
10. The simulation efficiency testing method according to claim 1, characterized in that, The computational efficiency of the tested model computing system is determined based on the model performance parameters, the model data throughput parameters, and the computational performance parameters, including: If the model performance parameters are within the range of the baseline model performance parameters, then the model computation efficiency is determined based on the model performance parameters, the model data throughput parameters, and the computation performance parameters. If the model performance parameters are not within the range of the baseline model performance parameters, then the model performance parameters are set to 0, and the model computation efficiency is determined based on the model performance parameters, the model data throughput parameters, and the computation performance parameters.
11. The simulation efficiency testing method according to claim 1, characterized in that, The model performance parameter is the model accuracy of the tested model, and the model accuracy is determined by the following formula: Wherein, the measured accuracy of the model is the measured accuracy of the tested model in the model calculation task, and the benchmark accuracy is the minimum allowable accuracy of the tested model.
12. The simulation efficiency testing method according to claim 1, characterized in that, The computational performance parameters include computational efficiency; The computational efficiency is positively correlated with the measured computational performance parameters of the computational system of the tested model when performing the model computation task, and the computational efficiency is negatively correlated with the theoretical peak performance parameters of the computational system of the tested device. The computational performance corresponding to the computational performance parameter is positively correlated with the computational efficiency.
13. The simulation efficiency testing method according to claim 12, characterized in that, The steps for determining the measured performance parameters include: Obtain the measured total computational load and the measured total time consumed by the tested model computing system in executing the model computing task; The measured computational performance parameters are determined based on the measured total computational load and the measured total time consumption.
14. The simulation efficiency testing method according to claim 12, characterized in that, The measured computational performance parameter is the number of floating-point operations per second performed by the tested device on the model computation task; The theoretical peak performance parameter of the computing system is the theoretical peak number of floating-point operations per second of the device under test.
15. The simulation efficiency testing method according to claim 12, characterized in that, The steps for determining the measured performance parameters and the steps for determining the theoretical peak performance parameters of the calculation system include: Obtain the theoretical peak performance parameters of the computing system corresponding to the first data type of the device under test; If all actual data types in the model calculation task are the first data type, then the calculation performance parameter corresponding to the actual data type in the model calculation task shall be the measured calculation performance parameter. If there is an actual data type in the model calculation task that is not the first data type, then the calculation performance parameter corresponding to the actual data type in the model calculation task is equivalent to the calculation performance parameter corresponding to the first data type, and the measured calculation performance parameter is obtained.
16. The simulation efficiency testing method according to claim 12, characterized in that, The steps for determining the measured performance parameters and the steps for determining the theoretical peak performance parameters of the calculation system include: Obtain the theoretical peak performance parameters of the computing system corresponding to the second data type of the device under test; If all actual data types in the model calculation task are the second data type, then the calculation performance parameter corresponding to the actual data type in the model calculation task shall be the measured calculation performance parameter. If there is an actual data type in the model calculation task that is not the second data type, then the calculation performance parameter corresponding to the actual data type in the model calculation task is converted into the calculation performance parameter corresponding to the second data type to obtain the measured calculation performance parameter.
17. The simulation efficiency testing method according to claim 1, characterized in that, The computational efficiency of the tested model's computing system is determined based on the model performance parameters, the model data throughput parameters, and the computational performance parameters, and is calculated using the following formula: Wherein, the model accuracy is the model performance parameter; the unit model data computation amount is the computation amount required for the tested model to process a unit of input data; and the computation efficiency is the computation performance parameter.
18. The simulation efficiency testing method according to claim 1, characterized in that, The computational efficiency of the tested model's computing system is determined based on the model performance parameters, the model data throughput parameters, and the computational performance parameters, and is calculated using the following formula: Wherein, the model accuracy is the model performance parameter; the measured computation per input label is the model data throughput parameter, representing the computational amount required by the tested model to process each input label; and the computational efficiency is the computational performance parameter.
19. The simulation efficiency testing method according to claim 1, characterized in that, The computational efficiency of the tested model's computing system is determined based on the model performance parameters, the model data throughput parameters, and the computational performance parameters, and is calculated using the following formula: Wherein, the model accuracy is the model performance parameter; the unit model data computation amount is the computation amount required for the tested model to process a unit of input data; the ratio of the measured computation performance parameter of the tested model computing system executing the model computing task to the theoretical peak performance parameter of the computing system of the tested device represents the computation efficiency, and the computation efficiency is the computation performance parameter.
20. The simulation efficiency testing method according to claim 1, characterized in that, The computational efficiency of the tested model's computing system is determined based on the model performance parameters, the model data throughput parameters, and the computational performance parameters, and is calculated using the following formula: Wherein, the model accuracy is the model performance parameter; the unit model data computation amount is the computation amount required for the tested model to process a unit of input data; the ratio of the measured total computation amount to the measured total time of the model computation task is the measured computation performance parameter of the tested model computation system executing the model computation task; the ratio of the measured computation performance parameter to the theoretical peak performance parameter of the computing system of the tested device represents the computation efficiency, and the computation efficiency is the computation performance parameter.
21. The simulation efficiency testing method according to claim 1, characterized in that, The computational efficiency of the tested model's computing system is determined based on the model performance parameters, the model data throughput parameters, and the computational performance parameters, and is calculated using the following formula: Wherein, the model accuracy is the model performance parameter; the unit model data computation amount is the computation amount required for the tested model to process a unit of input data; the ratio of the product of the unit model data computation amount and the total number of input data to the measured total time of the model computation task is the measured computational performance parameter of the tested model computing system executing the model computation task; the ratio of the measured computational performance parameter to the theoretical peak performance parameter of the computing system of the tested device represents the computational efficiency, and the computational efficiency is the computational performance parameter.
22. The simulation efficiency testing method according to claim 1, characterized in that, The computational efficiency of the tested model's computing system is determined based on the model performance parameters, the model data throughput parameters, and the computational performance parameters, and is calculated using the following formula: Wherein, the model accuracy is the model performance parameter; the ratio of the total number of input data to the total measured time is the model data throughput parameter; and the theoretical peak performance parameter of the computing system is the computing performance parameter.
23. The simulation efficiency testing method according to claim 1, characterized in that, The computational efficiency of the tested model's computing system is determined based on the model performance parameters, the model data throughput parameters, and the computational performance parameters, and is calculated using the following formula: Wherein, the model accuracy is the model performance parameter; the throughput rate is the model data throughput parameter, representing the rate at which the tested model processes input data per unit time; and the theoretical peak performance parameter of the computing system is the computing performance parameter.
24. The simulation efficiency testing method according to claim 1, characterized in that, The computational efficiency of the tested model computing system is determined based on the model performance parameters, the model data throughput parameters, and the computational performance parameters to obtain the test results of the tested model computing system, including: Based on the computational efficiency of the tested model computing system under multiple test scenarios, the overall computational efficiency of the tested model computing system is determined. The overall computational efficiency is used as the test result; The test scenarios correspond one-to-one with the datasets.
25. The simulation efficiency testing method according to claim 1, characterized in that, The computational efficiency of the tested model computing system is determined based on the model performance parameters, the model data throughput parameters, and the computational performance parameters to obtain the test results of the tested model computing system, including: Based on the computational efficiency of the tested model computing system under multiple test scenarios, the overall computational efficiency of the tested model computing system is determined. The overall computational efficiency is used as the test result; One of the test scenarios corresponds to one or more of the datasets.
26. The simulation efficiency testing method according to claim 25, characterized in that, The steps for determining the computational efficiency corresponding to a single test scenario include: If the test scenario corresponds to a dataset, then the simulation efficiency corresponding to the dataset is taken as the simulation efficiency corresponding to the test scenario; If the test scenario corresponds to multiple datasets, then the simulation efficiency corresponding to the test scenario is determined based on the simulation efficiency corresponding to each dataset.
27. The simulation efficiency testing method according to claim 26, characterized in that, Determining the computational efficiency of the test scenario based on the computational efficiency of each dataset corresponding to the test scenario includes: The arithmetic mean of the computational efficiencies corresponding to each dataset is taken as the computational efficiency corresponding to the test scenario.
28. The simulation efficiency testing method according to claim 26, characterized in that, Determining the computational efficiency of the test scenario based on the computational efficiency of each dataset corresponding to the test scenario includes: The geometric mean of the computational efficiency corresponding to each dataset is taken as the computational efficiency corresponding to the test scenario.
29. The simulation efficiency testing method according to claim 25, characterized in that, The steps for determining the computational efficiency corresponding to a single test scenario include: If the test scenario corresponds to a dataset, then the simulation efficiency corresponding to the dataset is taken as the simulation efficiency corresponding to the test scenario; If the test scenario corresponds to multiple datasets, then the single-scenario model performance parameters corresponding to the test scenario are determined according to the model performance parameters corresponding to each dataset; the single-scenario model data throughput parameters corresponding to the test scenario are determined according to the model data throughput parameters corresponding to each dataset; and the single-scenario computational performance parameters corresponding to the test scenario are determined according to the computational performance parameters corresponding to each dataset. The computational efficiency corresponding to the test scenario is determined based on the single-scenario model performance parameters, the single-scenario model data throughput parameters, and the single-scenario computational performance parameters.
30. The simulation efficiency testing method according to claim 29, characterized in that, Determine the single-scene model performance parameter corresponding to the test scenario based on the model performance parameters corresponding to each dataset corresponding to the test scenario, including: using the arithmetic mean of the model performance parameters corresponding to each dataset as the single-scene model performance parameter; The single-scene model data throughput parameter corresponding to the test scenario is determined based on the model data throughput parameters corresponding to each dataset of the test scenario, including: using the arithmetic mean of the model data throughput parameters corresponding to each dataset as the single-scene model data throughput parameter; Determining the single-scene computational performance parameter corresponding to the test scenario based on the computational performance parameters corresponding to each dataset of the test scenario includes: using the arithmetic mean of the computational performance parameters corresponding to each dataset as the single-scene computational performance parameter.
31. The simulation efficiency testing method according to claim 29, characterized in that, Determine the single-scene model performance parameter corresponding to the test scenario based on the model performance parameters corresponding to each dataset corresponding to the test scenario, including: using the geometric mean of the model performance parameters corresponding to each dataset as the single-scene model performance parameter; The single-scene model data throughput parameter corresponding to the test scenario is determined based on the model data throughput parameters corresponding to each dataset corresponding to the test scenario, including: using the geometric mean of the model data throughput parameters corresponding to each dataset as the single-scene model data throughput parameter; Determining the single-scene computational performance parameter corresponding to the test scenario based on the computational performance parameters corresponding to each dataset of the test scenario includes: using the geometric mean of the computational performance parameters corresponding to each dataset as the single-scene computational performance parameter.
32. The simulation efficiency testing method according to claim 24 or 25, characterized in that, Based on the computational efficiency of the tested model computing system under multiple test scenarios, the overall computational efficiency of the tested model computing system is determined, including: The arithmetic mean of the simulation efficiency corresponding to multiple test scenarios is taken as the comprehensive simulation efficiency.
33. The simulation efficiency testing method according to claim 32, characterized in that, The overall simulation efficiency is calculated using the arithmetic mean of the simulation efficiencies corresponding to multiple test scenarios, obtained through the following formula: Among them, MCE t The MCE represents the overall computational efficiency. i f represents the computational efficiency of the tested model computing system in the i-th test scenario. 1i This represents the first weight coefficient of the test model calculation system in the i-th test scenario.
34. The simulation efficiency testing method according to claim 24 or 25, characterized in that, Based on the computational efficiency of the tested model computing system under multiple test scenarios, the overall computational efficiency of the tested model computing system is determined, including: The geometric mean of the simulation efficiency corresponding to multiple test scenarios is taken as the comprehensive simulation efficiency.
35. The simulation efficiency testing method according to claim 34, characterized in that, The overall simulation efficiency is calculated using the geometric mean of the simulation efficiencies corresponding to multiple test scenarios, obtained through the following formula: Among them, MCE t The MCE represents the overall computational efficiency. i The model computation efficiency of the tested model computation system in the i-th test scenario is represented by Π(·), where Π(·) represents cumulative multiplication calculation, and f 2i This represents the second weight coefficient of the test model calculation system in the i-th test scenario.
36. The simulation efficiency testing method according to claim 1, characterized in that, Controlling the device under test to run the software framework under test to invoke the model under test and the dataset to perform model computation tasks includes: The device under test loads the software framework under test and performs initialization of the software framework under test; The software framework under test is started, and the model under test and the dataset are invoked to perform the model calculation task.
37. The simulation efficiency testing method according to claim 36, characterized in that, The process of loading the software framework under test on the device under test and performing initialization of the software framework under test includes: Load the library files of the test model computing system onto the device under test; The type of the interface function of the computing system of the tested model is obtained through the calling interface of the library file; The interface function is initialized according to its type.
38. The simulation efficiency testing method according to claim 37, characterized in that, The types of the interface functions include at least: initialization functions, completion functions, data loading functions, data unloading functions, model calculation control functions, and test parameter return functions; The initialization function is the interface function that performs initialization according to the test configuration information; the completion function is the interface function that executes the task after the tested model computing system has completed the model computing task; the data loading function is the interface function that loads the model computing data; the data unloading function is the interface function that unloads the model computing data; the model computing control function is the interface function that executes the model computing task; and the test parameter return function outputs the test parameters configured to calculate at least one of the model performance parameters, the model data throughput parameters, and the computing performance parameters.
39. The simulation efficiency testing method according to claim 38, characterized in that, The process of launching the software framework under test and invoking the model under test to execute the model computation task includes: The software framework under test is started to execute the model calculation task; Obtain the first result returned by the test parameter return function after the software framework under test calls the data loading function to load the model calculation data; Obtain the second result of the test parameter return function after the software framework under test calls the model calculation control function to execute the model calculation task; Wait for the third result output by the software framework under test after it has completed the model calculation task; Obtain the fourth result returned by the test parameter return function after the software framework under test calls the data unloading function to unload the model calculation data; The model accuracy of the tested model is calculated based on the execution results of the model calculation task, including: The model accuracy is calculated based on at least one of the first result, the second result, the third result, and the fourth result.
40. A simulation efficiency testing system, characterized in that, Includes testing equipment and the equipment under test; The device under test is configured to run the software framework under test to perform model calculation tasks based on the model under test and the dataset. The testing device is configured to acquire the dataset, control the device under test to run the software framework under test to call the model under test and the dataset to perform model calculation tasks; determine model performance parameters, model data throughput parameters and calculation performance parameters based on the model calculation tasks; determine the model calculation efficiency of the model calculation system under test based on the model performance parameters, the model data throughput parameters and the calculation performance parameters, so as to obtain the test results of the model calculation system under test. The tested model computing system includes the tested model, the tested software framework, and the tested device.
41. A simulation efficiency testing device, characterized in that, include: The acquisition unit is configured to acquire a dataset. The control unit is configured to control the device under test to run the software framework under test in order to invoke the model under test and the dataset to perform model calculation tasks. The determining unit is configured to determine model performance parameters, model data throughput parameters, and computational performance parameters based on the model computation task. The computational efficiency of the tested model computing system is determined based on the model performance parameters, the model data throughput parameters, and the computational performance parameters, so as to obtain the test results of the tested model computing system. The tested model computing system includes the tested model, the tested software framework, and the tested device.
42. A simulation efficiency testing device, characterized in that, include: Memory, configured to store computer programs; A processor is configured to execute the computer program, which, when executed by the processor, implements the steps of the simulation efficiency testing method as described in any one of claims 1 to 39.
43. A non-volatile storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by the processor, it implements the steps of the simulation efficiency testing method as described in any one of claims 1 to 39.
44. A computer program product, comprising a computer program, characterized in that, When the computer program is executed by the processor, it implements the steps of the simulation efficiency testing method as described in any one of claims 1 to 39.