An adaptive test optimization method for AI card mixed precision mode

CN122285409APending Publication Date: 2026-06-26四川华鲲振宇智能科技有限责任公司

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
四川华鲲振宇智能科技有限责任公司
Filing Date
2026-05-27
Publication Date
2026-06-26

Smart Images

  • Figure CN122285409A_ABST
    Figure CN122285409A_ABST
Patent Text Reader

Abstract

This invention discloses an adaptive testing optimization method for AI card hybrid precision mode, belonging to the field of artificial intelligence and AI accelerator card testing technology. The invention first classifies AI tasks and collects load characteristics, constructing an AI business load library containing the operational characteristics of multiple AI tasks. Based on the operator precision sensitivity classification, corresponding precision types are matched. Precision combination test cases are adaptively generated in conjunction with the business load library. Then, the resource constraints of AI card computing power partitioning instances are used to complete the matching, screening, and parameter adjustment of test cases. Finally, testing is completed through dynamic load injection, and an adapted configuration scheme is output after multi-dimensional quantitative evaluation. This invention realizes intelligent testing of AI card hybrid precision mode, improves test scenario coverage and execution efficiency, effectively adapts to single-card multi-instance deployment scenarios, and provides reliable parameter support for the business implementation of hybrid precision mode.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of artificial intelligence and AI accelerator card testing technology, and in particular to an adaptive testing optimization method for AI card hybrid precision mode. Background Technology

[0002] With the rapid deployment and large-scale application of artificial intelligence (AI) technology, AI accelerator cards have become critical computing infrastructure for core business scenarios such as large-scale model training and high-concurrency AI inference. Mixed precision mode, as a core technology for improving computing power utilization efficiency and reducing GPU memory consumption, has become a standard feature of mainstream AI cards. In actual business deployments, mixed precision mode dynamically switches between different floating-point precision types during computation, significantly improving the computational throughput of AI tasks while controlling the loss of computational precision, adapting to the computing power requirements of different business scenarios such as large-scale model training and high-concurrency inference. Simultaneously, the maturity of AI card computing power partitioning and resource isolation technologies has promoted the widespread adoption of single-card multi-instance parallel deployment modes. A single AI card can be divided into multiple isolated computing power partitioning instances, supporting multi-tenant, multi-task parallel operation, further improving the resource utilization efficiency of AI cards, and also placing more comprehensive requirements on the adaptability and operational stability of AI card mixed precision mode. Currently, the industry has developed several benchmark test suites for AI card computing power and performance. Related testing technologies are also continuously developing around the hardware performance verification and functional integrity testing of AI cards. The relevant testing and verification for mixed precision mode has also become one of the core links in the pre-launch testing and pre-deployment adaptation of AI cards.

[0003] Current testing solutions for mixed-precision modes of AI cards still have several shortcomings and are difficult to adapt to the complex and diverse needs of real-world business scenarios. Existing testing solutions mostly employ fixed precision combinations and static test scenarios, failing to generate targeted test cases based on the operator characteristics of different AI tasks, the hardware characteristics of different AI cards, and the implementation logic of different mixed-precision modes. This results in a significant deviation between the test scenarios and actual business operation characteristics. Furthermore, the generation of existing test cases largely relies on manual enumeration, requiring manual configuration of various test parameters such as model, batch size, and precision combinations. This leads to long test execution cycles and high adaptation costs for different AI card models and different computing power allocation instances. Furthermore, existing testing solutions do not fully consider resource constraints in AI card computing power allocation scenarios, and cannot adapt test cases to resource quotas for different computing power allocation instances. This can easily lead to test interruptions and invalid test results due to resource mismatch. At the same time, existing tests are mostly conducted under static fixed loads, which cannot verify the stability of precision switching in mixed precision mode under dynamic business loads. It is also difficult to fully expose potential problems of mixed precision mode under boundary operating conditions. Moreover, test results are mostly based on single performance data and lack a scientific multi-dimensional quantitative evaluation system. They cannot output mixed precision mode configuration schemes adapted to specific business scenarios and instance resources, and the practical application value of the test results is limited. Summary of the Invention

[0004] The purpose of this invention is to overcome the shortcomings of the prior art and provide an adaptive testing optimization method for AI card mixed precision mode.

[0005] The objective of this invention is achieved through the following technical solution: An adaptive test optimization method for AI card mixed precision mode is provided, which includes the following steps: S1. Construct an AI workload library, classify AI tasks, collect the workload characteristics of AI tasks in the mixed precision mode of AI cards, construct workload templates based on the workload characteristics, and the workload templates include task type, operator ratio, computing power requirement, memory requirement and precision sensitivity. Summarize the workload templates to form the AI ​​workload library. S2. Generate precision combination test cases. Based on the set grading rules, determine the operator's sensitivity level to precision. Based on the operator's sensitivity level, match the corresponding precision type. Combine the AI ​​business workload library to generate precision combination test cases. Precision combination test cases include precision combination, operator matching rules, and expected precision loss rate. S3. Adjust precision combination test cases, collect the resource quota of AI card computing power partitioning instances, calculate the computing power and video memory requirements of AI cards in AI card mixed precision mode, match and filter precision combination test cases, adjust the parameters of precision combination test cases until the resource constraints are met, and mark precision combination test cases that do not meet the resource constraints and cannot be adjusted. S4. Perform AI card mixed precision mode test, generate dynamic load fluctuation sequence based on AI business load library, inject dynamic load fluctuation sequence into AI card computing power partitioning instance, run adjusted precision combination test cases, collect running indicators during the test process, complete quantitative evaluation based on running indicators, and output AI card mixed precision mode configuration scheme.

[0006] Furthermore, step S1 includes the following sub-steps: S1.1. Classify AI tasks into two categories: training tasks and inference tasks; S1.2. Collect the load characteristics of each type of AI task in the mixed precision mode of the AI ​​card. The collection process covers the entire process of the AI ​​card running in the mixed precision mode. The load characteristics include computing power usage curve, memory fluctuation pattern, operator type distribution and precision sensitivity distribution. S1.3. Standardize the collected load characteristics. Standardization includes unifying the format of the load characteristics and normalizing the numerical range. Construct a business load template based on the standardized load characteristics. The business load template includes task type, operator proportion, computing power requirement, video memory requirement and precision sensitivity. S1.4. All business load templates are categorized and stored, and all business load templates are aggregated to form an AI business load library. The AI ​​business load library is used for the subsequent generation of precision combination test cases.

[0007] Furthermore, step S2 includes the following sub-steps: S2.1. Based on the set grading rules, perform pre-tests on the operators, obtain the deviation of the calculation results of the operators under different precision types, and determine the sensitivity level of the operators to precision; S2.2. Match operators with different sensitivity levels to corresponding precision types, and form a one-to-one matching relationship between precision type and sensitivity level; S2.3. Combine the business load templates in the AI ​​business load library, extract the operator proportion and precision sensitivity in the business load templates, and generate multiple sets of precision combination test cases for each type of AI task; S2.4. Configure the precision combination, operator matching rules and expected precision loss rate for each precision combination test case. The precision combination test cases are used in the subsequent resource constraint adaptation process.

[0008] Furthermore, step S3 includes the following sub-steps: S3.1. Collect the resource quota of AI card computing power allocation instance through AI card management tool. The resource quota includes the number of AI Cores, the capacity of video memory and the size of cache. S3.2. Based on the operator matching rules and precision combinations in the precision combination test cases, calculate the computing power and video memory requirements of the AI ​​card when running the corresponding precision combination test cases in the AI ​​card mixed precision mode; S3.3. Based on computing power and memory requirements, and combined with the resource quota of AI card computing power allocation instances, match and filter precision combination test cases; S3.4. For precision combination test cases that do not meet resource constraints, adjust the parameters of the precision combination test cases until the resource constraints are met; S3.5. For test cases with precision combinations that cannot be adjusted to meet resource constraints, mark them as unsuitable AI card computing power allocation instances. The adjusted precision combination test cases will be used in subsequent AI card mixed precision mode testing processes.

[0009] Furthermore, step S4 includes the following sub-steps: S4.1. Based on the computing power occupancy curve in the AI ​​business load library, extract the changing pattern of computing power fluctuations and generate a dynamic load fluctuation sequence; S4.2. During the execution of the precision combination test cases, inject a dynamic load fluctuation sequence into the target AI card computing power partitioning instance at set time intervals; S4.3. Monitor the precision switching status of the AI ​​card's mixed precision mode in real time, collect the operating indicators during the test process, and the operating indicators cover the entire lifecycle data of the precision switching process; S4.4. Based on the established evaluation system and collected operational indicators, calculate the comprehensive score of the AI ​​card's mixed accuracy mode; S4.5. Based on the comprehensive score, output the configuration scheme of the AI ​​card mixed precision mode for the corresponding AI card computing power allocation instance. The configuration scheme is used to guide the business deployment of AI card mixed precision mode.

[0010] Furthermore, in step S2, corresponding precision types are matched for operators with different sensitivity levels. The precision types include FP32, BF16, and FP16. Based on the sensitivity levels divided by the set grading rules, a unique precision type is matched for each sensitivity level. The matching process follows the set mapping rules, which are consistent with the computational characteristics and precision requirements of the operators. The matched precision type forms a fixed binding relationship with the corresponding operator. The fixed binding relationship is synchronously written into the operator matching rules of the corresponding precision combination test case. The operator matching rules are used to constrain the operator precision allocation process during the execution of the precision combination test case, ensuring that the precision allocation process is consistent with the preset matching relationship.

[0011] Furthermore, in step S3, the parameters of the precision combination test cases are adjusted. The parameters include the batch size and the model fragment size. The parameter adjustment process follows the set adjustment step size and adjustment range. After each parameter adjustment, the computing power requirement and memory requirement of the corresponding precision combination test case are recalculated. The recalculated computing power requirement and memory requirement are compared with the resource quota of the AI ​​card computing power partitioning instance until the comparison result meets the set resource constraints. During the parameter adjustment process, the precision combination and operator matching rules of the precision combination test cases remain unchanged to ensure that the core test objectives of the test cases do not deviate.

[0012] Furthermore, in step S4, for the single-card multi-instance scenario, different AI card mixed precision modes and AI tasks are configured for different AI card computing power partitioning instances. Performance interference indicators between different AI card computing power partitioning instances are collected. During the configuration process, an independent test process is assigned to each AI card computing power partitioning instance. All test processes are started and run synchronously according to the set time nodes. During the operation, performance interference indicators between different AI card computing power partitioning instances are continuously collected according to the set sampling frequency. The performance interference indicators include computing power preemption rate and memory conflict rate. The collected performance interference indicators are synchronously incorporated into the running indicators of the test process for subsequent quantitative evaluation.

[0013] Furthermore, in step S4, a quantitative evaluation is completed based on the established multi-dimensional evaluation system. The multi-dimensional evaluation system includes performance indicators, accuracy indicators, resource indicators, and stability indicators. The multi-dimensional evaluation system sets corresponding calculation rules and data sources for each indicator. The data sources are the operational indicators collected during the test. The numerical calculation of each indicator is completed according to the preset calculation rules. The calculation results of each indicator are used simultaneously in the subsequent comprehensive score calculation process to ensure that the quantitative evaluation process covers all core dimensions of the test process. The entire process of quantitative evaluation follows the set calculation order.

[0014] Furthermore, in step S4, based on the evaluation results of the multi-dimensional evaluation system, a weighted summation method is used to calculate the comprehensive score of the AI ​​card mixed precision mode. Based on the comprehensive score, a configuration scheme for the AI ​​card mixed precision mode is output. The configuration scheme includes recommended precision combinations, suitable batch size, and load intensity threshold. A corresponding weight coefficient is assigned to each indicator item of the multi-dimensional evaluation system, and the sum of the weight coefficients is a set value. The comprehensive score is calculated using the weighted summation method. Based on the ranking results of the comprehensive score, configuration parameters that meet the set requirements are selected. The selected configuration parameters are summarized to form the configuration scheme for the AI ​​card mixed precision mode of the corresponding AI card computing power partitioning instance.

[0015] The beneficial effects of this invention are: (1) Through the construction of AI business load library, adaptive generation of precision combination test cases, resource constraint adaptation and test execution evaluation, the intelligent testing of AI card mixed precision mode is realized, and the test scenario coverage and test process execution efficiency are improved. (2) Based on the operator precision sensitivity classification, the precision type matching and test case generation are completed, which replaces the traditional manual enumeration test case construction mode, reduces the test case generation cycle, and reduces the operation cost and adaptation difficulty of manual testing. (3) Through dynamic load fluctuation sequence injection and multi-dimensional quantitative evaluation system, the dynamic operation stability of AI card hybrid precision mode is fully verified, and an adapted configuration scheme is output to provide feasible parameter support for business deployment. Attached Figure Description

[0016] Figure 1 A flowchart illustrating the steps of an adaptive testing optimization method for a hybrid precision mode of an AI card. Figure 2 The flowchart illustrates the specific steps of an adaptive test optimization method for mixed precision mode of an AI card, as provided in this embodiment. Detailed Implementation

[0017] The technical solution of the present invention will be clearly and completely described below with reference to the embodiments. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0018] Example 1 See Figure 1 This embodiment provides an adaptive test optimization method for AI card mixed precision mode, which includes the following steps: S1. Construct an AI workload library, classify AI tasks, collect the workload characteristics of AI tasks in the mixed precision mode of AI cards, construct workload templates based on the workload characteristics, and the workload templates include task type, operator ratio, computing power requirement, memory requirement and precision sensitivity. Summarize the workload templates to form the AI ​​workload library. S2. Generate precision combination test cases. Based on the set grading rules, determine the operator's sensitivity level to precision. Based on the operator's sensitivity level, match the corresponding precision type. Combine the AI ​​business workload library to generate precision combination test cases. Precision combination test cases include precision combination, operator matching rules, and expected precision loss rate. S3. Adjust precision combination test cases, collect the resource quota of AI card computing power partitioning instances, calculate the computing power and video memory requirements of AI cards in AI card mixed precision mode, match and filter precision combination test cases, adjust the parameters of precision combination test cases until the resource constraints are met, and mark precision combination test cases that do not meet the resource constraints and cannot be adjusted. S4. Perform AI card mixed precision mode test, generate dynamic load fluctuation sequence based on AI business load library, inject dynamic load fluctuation sequence into AI card computing power partitioning instance, run adjusted precision combination test cases, collect running indicators during the test process, complete quantitative evaluation based on running indicators, and output AI card mixed precision mode configuration scheme.

[0019] In some embodiments, step S1 includes the following sub-steps: S1.1. Classify AI tasks into two categories: training tasks and inference tasks; S1.2. Collect the load characteristics of each type of AI task in the mixed precision mode of the AI ​​card. The collection process covers the entire process of the AI ​​card running in the mixed precision mode. The load characteristics include computing power usage curve, memory fluctuation pattern, operator type distribution and precision sensitivity distribution. S1.3. Standardize the collected load characteristics. Standardization includes unifying the format of the load characteristics and normalizing the numerical range. Construct a business load template based on the standardized load characteristics. The business load template includes task type, operator proportion, computing power requirement, video memory requirement and precision sensitivity. S1.4. All business load templates are categorized and stored, and all business load templates are aggregated to form an AI business load library. The AI ​​business load library is used for the subsequent generation of precision combination test cases.

[0020] In some embodiments, step S2 includes the following sub-steps: S2.1. Based on the set grading rules, perform pre-tests on the operators, obtain the deviation of the calculation results of the operators under different precision types, and determine the sensitivity level of the operators to precision; S2.2. Match operators with different sensitivity levels to corresponding precision types, and form a one-to-one matching relationship between precision type and sensitivity level; S2.3. Combine the business load templates in the AI ​​business load library, extract the operator proportion and precision sensitivity in the business load templates, and generate multiple sets of precision combination test cases for each type of AI task; S2.4. Configure the precision combination, operator matching rules and expected precision loss rate for each precision combination test case. The precision combination test cases are used in the subsequent resource constraint adaptation process.

[0021] In some embodiments, step S3 includes the following sub-steps: S3.1. Collect the resource quota of AI card computing power allocation instance through AI card management tool. The resource quota includes the number of AI Cores, the capacity of video memory and the size of cache. S3.2. Based on the operator matching rules and precision combinations in the precision combination test cases, calculate the computing power and video memory requirements of the AI ​​card when running the corresponding precision combination test cases in the AI ​​card mixed precision mode; S3.3. Based on computing power and memory requirements, and combined with the resource quota of AI card computing power allocation instances, match and filter precision combination test cases; S3.4. For precision combination test cases that do not meet resource constraints, adjust the parameters of the precision combination test cases until the resource constraints are met; S3.5. For test cases with precision combinations that cannot be adjusted to meet resource constraints, mark them as unsuitable AI card computing power allocation instances. The adjusted precision combination test cases will be used in subsequent AI card mixed precision mode testing processes.

[0022] In some embodiments, step S4 includes the following sub-steps: S4.1. Based on the computing power occupancy curve in the AI ​​business load library, extract the changing pattern of computing power fluctuations and generate a dynamic load fluctuation sequence; S4.2. During the execution of the precision combination test cases, inject a dynamic load fluctuation sequence into the target AI card computing power partitioning instance at set time intervals; S4.3. Monitor the precision switching status of the AI ​​card's mixed precision mode in real time, collect the operating indicators during the test process, and the operating indicators cover the entire lifecycle data of the precision switching process; S4.4. Based on the established evaluation system and collected operational indicators, calculate the comprehensive score of the AI ​​card's mixed accuracy mode; S4.5. Based on the comprehensive score, output the configuration scheme of the AI ​​card mixed precision mode for the corresponding AI card computing power allocation instance. The configuration scheme is used to guide the business deployment of AI card mixed precision mode.

[0023] In some embodiments, in step S2, corresponding precision types are matched for operators with different sensitivity levels. The precision types include FP32, BF16, and FP16. Based on the sensitivity levels divided by the set grading rules, a unique precision type is matched for each sensitivity level. The matching process follows the set mapping rules, which are consistent with the computational characteristics and precision requirements of the operators. The matched precision type forms a fixed binding relationship with the corresponding operator. The fixed binding relationship is synchronously written into the operator matching rules of the corresponding precision combination test case. The operator matching rules are used to constrain the operator precision allocation process during the execution of the precision combination test case, ensuring that the precision allocation process is consistent with the preset matching relationship.

[0024] In some embodiments, in step S3, the parameters of the precision combination test cases are adjusted. The parameters include batch size and model fragment size. The parameter adjustment process follows the set adjustment step size and adjustment range. After each parameter adjustment, the computing power requirement and memory requirement of the corresponding precision combination test case are recalculated. The recalculated computing power requirement and memory requirement are compared with the resource quota of the AI ​​card computing power partitioning instance until the comparison result meets the set resource constraints. During the parameter adjustment process, the precision combination and operator matching rules of the precision combination test cases are kept unchanged to ensure that the core test objectives of the test cases do not deviate.

[0025] In some embodiments, in step S4, for a single-card multi-instance scenario, different AI card mixed precision modes and AI tasks are configured for different AI card computing power partitioning instances. Performance interference indicators between different AI card computing power partitioning instances are collected. During the configuration process, an independent test process is assigned to each AI card computing power partitioning instance. All test processes are started and run synchronously according to the set time nodes. During the operation, performance interference indicators between different AI card computing power partitioning instances are continuously collected according to the set sampling frequency. The performance interference indicators include computing power preemption rate and memory conflict rate. The collected performance interference indicators are synchronously incorporated into the running indicators of the test process for subsequent quantitative evaluation.

[0026] In some embodiments, in step S4, a quantitative evaluation is completed based on a set multi-dimensional evaluation system. The multi-dimensional evaluation system includes performance indicators, accuracy indicators, resource indicators, and stability indicators. The multi-dimensional evaluation system sets corresponding calculation rules and data sources for each indicator. The data sources are the operating indicators collected during the test. The numerical calculation of each indicator is completed according to the preset calculation rules. The calculation results of each indicator are used synchronously in the subsequent comprehensive score calculation process to ensure that the quantitative evaluation process covers all core dimensions of the test process. The entire process of quantitative evaluation is executed in accordance with the set calculation order.

[0027] In some embodiments, in step S4, based on the evaluation results of the multi-dimensional evaluation system, a weighted summation method is used to calculate the comprehensive score of the AI ​​card mixed precision mode. Based on the comprehensive score, a configuration scheme for the AI ​​card mixed precision mode is output. The configuration scheme includes recommended precision combinations, suitable batch size, and load intensity threshold. A corresponding weight coefficient is assigned to each indicator item of the multi-dimensional evaluation system, and the sum of the weight coefficients is a set value. The comprehensive score is calculated according to the weighted summation method. Based on the ranking results of the comprehensive score, configuration parameters that meet the set requirements are selected. The selected configuration parameters are summarized to form the configuration scheme of the AI ​​card mixed precision mode for the corresponding AI card computing power partitioning instance.

[0028] Example 2 This embodiment provides a specific implementation process for an automatic monitoring and adaptation method for multiple configurations on the same board. This embodiment focuses on adaptive testing optimization in AI card mixed-precision mode as its core implementation scenario. It addresses issues existing in AI card mixed-precision mode testing, such as static and fixed test methods, low test case generation efficiency, insufficient resource adaptability across multiple instances, and lack of stability verification under dynamic loads. It achieves intelligent testing and configuration optimization for AI card mixed-precision mode. Figure 2 As shown, the specific implementation process is as follows: Step 1. Build the AI ​​workload library: Step 1.1. Classifying AI Tasks: In this embodiment, all AI tasks to be tested are first classified into two categories: training tasks and inference tasks. Here's a brief explanation of AI tasks: AI tasks refer to artificial intelligence-related computational tasks completed using the computing power resources of the AI ​​card. Training tasks involve iteratively updating AI model parameters by inputting sample data, while inference tasks involve performing forward computations on the trained AI model using input data and outputting results. These two types of tasks differ significantly in computing power requirements, memory usage patterns, and operator usage distribution. This embodiment uses this classification to ensure targeted execution of subsequent load feature collection, guaranteeing that the collected load features match the actual operational characteristics of the corresponding task type.

[0029] Step 1.2. Collecting the load characteristics of AI tasks: In this embodiment, for each type of AI task that has been divided, its load characteristics in the mixed precision mode of the AI ​​card are collected. Load characteristics refer to the characteristic data related to resource consumption and computational behavior exhibited by the AI ​​task during its operation on the AI ​​card. In this embodiment, the load characteristics collected include computing power consumption curves, memory fluctuation patterns, operator type distribution, and precision sensitivity distribution.

[0030] The data acquisition process covers the entire workflow of the AI ​​card's mixed-precision mode operation, from AI task initiation and initialization, computation execution to resource release at task completion. Continuous acquisition of corresponding load features is performed throughout this entire cycle, ensuring that the acquired feature data fully reflects the complete operational status of the AI ​​task in the AI ​​card's mixed-precision mode. During the acquisition process, for the divided training and inference tasks, the corresponding load features are acquired separately. For each task type, multiple sets of repeated acquisition processes are performed to ensure the consistency and representativeness of the acquired feature data, avoiding the influence of accidental factors during a single acquisition process.

[0031] Step 1.3. Standardize the collected load features and construct a business load template: In this embodiment, the collected load features are standardized, and a business load template is constructed based on the processed feature data. Standardization refers to the process of unifying the format and adjusting the numerical range of load feature data from different sources and in different formats. In this embodiment, standardization includes unifying the format of the load features and normalizing their numerical range.

[0032] Format unification refers to converting load characteristic data acquired from different collection channels into a unified storage format and data structure, ensuring that all characteristic data can be read and retrieved uniformly during subsequent processing. Numerical range normalization refers to converting characteristic data with different dimensions and numerical ranges into a unified numerical range, eliminating dimensional differences between different characteristic data and ensuring comparability. Based on the standardized load characteristics, business load templates are constructed. These templates include task type, operator proportion, computing power requirements, GPU memory requirements, and precision sensitivity. Each business load template corresponds one-to-one with the operational characteristics of a type of AI task, fully reflecting the operational requirements and performance characteristics of the corresponding AI task in the AI ​​card's mixed precision mode.

[0033] Step 1.4. Complete the construction and storage of the AI ​​workload library: In this embodiment, all completed workload templates are categorized and stored, forming an AI workload library. The AI ​​workload library refers to a dataset containing all standardized workload templates. In this embodiment, all completed workload templates are categorized and stored according to the type of AI task. Specifically, workload templates corresponding to training tasks and inference tasks are stored separately, facilitating quick retrieval of the corresponding workload templates for different task types in subsequent steps. The AI ​​workload library, formed by aggregating all workload templates, is used in the subsequent precision combination test case generation process, providing basic feature data and task feature support for the adaptive generation of test cases.

[0034] In some specific implementations, regarding the construction process of the AI ​​workload library, the workload characteristics of 120 AI tasks were collected and standardized, including 72 training tasks and 48 inference tasks. Each task underwent at least 10 full-process workload characteristic collections, with a sampling frequency of 100Hz per collection to ensure that the time resolution of the collected computing power occupancy curves, memory fluctuation patterns, and other time-series characteristics was no less than 10ms. For the workload characteristics of each task, the format was standardized and the numerical range was normalized, mapping the numerical range of all feature data to the [0,1] interval. Based on the processed feature data, 120 corresponding workload templates were constructed. Each template contains five fixed fields: task type, operator proportion, computing power requirement, memory requirement, and precision sensitivity. A unique identification code was assigned to each template, and they were classified and stored according to two main categories: training tasks and inference tasks, forming a structured AI workload library. This implementation process can solve the problem of insufficient feature representativeness caused by single sample collection. Through multiple sets of repeated collection and standardization processing, it ensures that the business load template can accurately restore the real operation characteristics of the corresponding AI task, and provides highly consistent feature data support for the adaptive generation of subsequent test cases.

[0035] In some embodiments, a task concurrency level feature can be added to the business workload template. The task concurrency level feature is used to reflect the concurrent call requirements of AI tasks in actual operation. Based on the added feature dimension, the business workload template can be classified in a more granular way, improving the adaptability of subsequent test cases to actual business scenarios.

[0036] Step 2. Generate precision combination test cases: Step 2.1. Determine the operator's sensitivity level to precision: In this embodiment, based on the established grading rules, pre-testing is performed on the operators to determine their sensitivity level to precision. Here, operators and operator precision sensitivity are explained separately. An operator refers to a functional unit that performs specific basic computational operations during AI task computation. It is a fundamental component of the AI ​​model's computational graph. Different operators have different precision requirements. In this embodiment, the operator is the basic unit for precision matching and use case generation. Operator precision sensitivity refers to the degree of deviation in the calculation results when the operator uses different floating-point precision types. The higher the degree of deviation, the higher the operator's sensitivity to precision, and the higher the requirement for computational precision.

[0037] In this embodiment, based on the established grading rules, a pre-test is performed on the operator. During the pre-test, the same calculation process of the same operator is completed using different precision types, and the deviation of the calculation results of the operator under different precision types is obtained. Based on the magnitude of the calculation result deviation, the sensitivity level of the operator to precision is determined according to the established grading rules. In the established grading rules, sensitivity levels corresponding to different deviation ranges are pre-set, and each operator corresponds to a unique sensitivity level, ensuring the consistency of the subsequent precision type matching process.

[0038] Step 2.2. Matching the corresponding precision type to operators with different sensitivity levels: In this embodiment, to complete the classification of operators with different sensitivity levels, the corresponding precision type is matched. The precision types used in this embodiment are explained here: FP32, BF16, and FP16. FP32 is a single-precision floating-point format, possessing high computational precision and a wide numerical representation range; BF16 is a double-precision floating-point format, possessing the same exponent range as FP32, which can reduce the probability of numerical overflow; FP16 is a half-precision floating-point format, possessing low memory usage and high computational efficiency. These three precision types are the standard precision types commonly used in the mixed precision mode of AI cards.

[0039] Based on the defined sensitivity levels according to the established grading rules, a unique precision type is matched for each sensitivity level. The matching process follows the defined mapping rules, which are consistent with the computational characteristics and precision requirements of the operators. This ensures that the matched precision type balances computational efficiency and resource consumption while controlling the deviation of the calculation results. The matched precision type and its corresponding operator form a fixed binding relationship. This fixed binding relationship is synchronously written into the operator matching rules of the corresponding precision combination test case. The operator matching rules are used to constrain the operator precision allocation process during the execution of the precision combination test case, ensuring that the precision allocation process is consistent with the preset matching relationship.

[0040] Step 2.3. Generate Precision Combination Test Cases Based on the AI ​​Workload Library: In this embodiment, precision combination test cases are generated for each type of AI task by combining the workload templates in the AI ​​workload library. Specifically, the operator proportion and precision sensitivity in the workload templates are extracted. Based on the extracted operator-related features and the operator sensitivity grading results that complete precision type matching, multiple sets of precision combination test cases are generated for each type of AI task.

[0041] During the generation process, different weights are set for operators with varying proportions in the business load template. This ensures that the generated precision combination test cases fully cover all types of operators in the corresponding AI task, while maintaining consistency with the operator usage distribution of the corresponding AI task. This improves the adaptability of the test cases to the actual running scenario of the corresponding AI task. Multiple sets of precision combination test cases are generated for each type of AI task, each corresponding to a different testing focus, covering different precision combination scenarios and ensuring comprehensive coverage of the testing process.

[0042] Step 2.4. Configure core parameters for precision combination test cases: In this embodiment, core operating parameters are configured for each group of generated precision combination test cases. This embodiment configures three types of core parameters for each group of precision combination test cases: precision combination, operator matching rules, and expected precision loss rate. Precision combination is the set of combinations of precision types corresponding to all operators in the test case, which can fully reflect the precision allocation of all operators in the test case; operator matching rules are the binding rules between operators and their corresponding precision types, used to constrain the calculation precision type adopted by each operator during the test case execution; expected precision loss rate is the maximum acceptable calculation result deviation threshold during the test case execution, used for subsequent test result verification and evaluation. Precision combination test cases with configured core parameters are used in the subsequent resource constraint adaptation process, providing a basis for the adjustment and selection of subsequent test cases.

[0043] In some specific implementations, regarding the operator precision sensitivity grading and precision type matching process, pre-testing and grading of 240 commonly used AI task operators were completed. During the pre-testing process, 1000 repeated calculations were performed for each group of operators using three precision types: FP32, BF16, and FP16. Deviation data of calculation results under different precisions were obtained, and sensitivity levels were determined based on the established grading rules. The grading rules were set as follows: operators with a relative deviation of less than 0.01% were classified as low sensitivity, operators with a relative deviation between 0.01% and 1% were classified as medium sensitivity, and operators with a relative deviation greater than 1% were classified as high sensitivity. After classification, FP32 precision type is matched for high-sensitivity operators, BF16 precision type for medium-sensitivity operators, and FP16 precision type for low-sensitivity operators. The binding relationship between operators and precision types is written into the operator matching rules. Each set of operator matching rules includes three items: operator type, corresponding precision type, and deviation threshold, ensuring consistency in operator precision allocation during test case execution. This implementation process solves the problem of lacking clear quantitative basis for operator precision matching. By obtaining quantified deviation data through multiple sets of repeated pre-tests, it achieves accurate classification of operator sensitivity and appropriate matching of precision types, reducing result deviations caused by unreasonable precision allocation during testing.

[0044] In some embodiments, for operators with the same sensitivity level, multiple sets of replaceable precision type matching schemes can be set. Each set of matching schemes corresponds to different computing power occupancy and precision loss control targets. Based on different test requirements, the corresponding matching scheme can be selected to complete the binding of the precision type and the operator, improving the flexibility of test case generation.

[0045] Step 3. Adaptation of test case resource constraints: Step 3.1. Collect the resource quotas of the AI card computing power splitting instances: In this embodiment, the resource quotas of the target AI card computing power splitting instances are collected through the AI card management tool. Here, the AI card computing power splitting instance and the AI card management tool are explained respectively. The AI card computing power splitting instance refers to dividing the physical computing power, video memory, cache and other resources of a single AI card into multiple independent and separately allocable resource units through resource isolation technologies at the hardware or software level. Each resource unit is a computing power splitting instance, and the resources between different computing power splitting instances are isolated from each other and can run different AI tasks and test cases independently without interference. The AI card management tool is a standard resource management tool supporting the AI card, which can query, allocate and monitor the status of the AI card resources and is an existing standardized tool supporting the AI card. In this embodiment, the collected resource quotas include the number of AICores, the video memory capacity and the cache size. These three types of parameters can completely reflect the upper limit of the available resources of the AI card computing power splitting instance and provide a resource benchmark for the subsequent matching and screening of test cases.

[0046] Step 3.2. Calculate the resource requirements corresponding to the test cases: In this embodiment, based on the precision combination test cases configured, the resource requirements for their operation are calculated. In this embodiment, based on the operator matching rules and precision combinations in the precision combination test cases, the computing power requirements and video memory requirements for running the corresponding precision combination test cases on the AI card in the AI card mixed precision mode are calculated.

[0047] The calculation process of the computing power requirements is based on the computational amounts of all operators in the test case, the computing power consumption coefficients of the corresponding precision types, and combines the execution process of the test case to complete the summary calculation of the overall computing power requirements. The calculation process of the video memory requirements is based on the input and output data amounts of all operators in the test case, the single data video memory occupancy coefficients of the corresponding precision types, and combines the video memory occupancy of the model parameters to complete the summary calculation of the overall video memory requirements. The calculated computing power requirements and video memory requirements can completely reflect the minimum resource amounts required for running the corresponding precision combination test cases on the AI card and provide a judgment basis for the subsequent matching and screening.

[0048] Step 3.3. Matching and Filtering Precision Combination Test Cases: In this embodiment, precision combination test cases are matched and filtered based on the calculated resource requirements and the collected instance resource quotas. Specifically, based on the calculated computing power and video memory requirements, combined with the resource quotas of the AI ​​card computing power allocation instances, precision combination test cases are matched and filtered. During the matching process, the computing power requirement of the test case is compared with the available computing power quota of the instance, and the video memory requirement of the test case is compared with the available video memory quota of the instance. When both the computing power and video memory requirements of the test case do not exceed the instance's resource quota, the test case is determined to match the corresponding instance; when either the computing power or video memory requirement of the test case exceeds the instance's resource quota, the test case is determined to be mismatched with the corresponding instance, and the subsequent parameter adjustment process begins. During the filtering process, all test cases that match the corresponding instance are retained, and mismatched test cases with no room for adjustment are removed, ensuring that the test cases can run normally within the corresponding instance during subsequent testing and that there will be no test interruption due to insufficient resources.

[0049] Step 3.4. Adjust parameters for mismatched test cases: In this embodiment, for precision combination test cases determined to be mismatched, parameter adjustment operations are performed to ensure they meet resource constraints. In this embodiment, for precision combination test cases that do not meet resource constraints, the parameters of the precision combination test cases are adjusted until the resource constraints are met. The adjusted parameters include batch size and model shard size. Here, the adjusted parameters are explained: Batch size refers to the number of sample data inputs to the model in a single operation during AI task execution. Adjusting the batch size directly changes the computing power and GPU memory requirements during test case execution. Model shard size refers to the unit size of the model when it is layered and segmented on the AI ​​card. Adjusting the model shard size changes the peak GPU memory usage and computing power distribution during test case execution.

[0050] The parameter adjustment process follows a set adjustment step size and range. The adjustment step size is the minimum change in each parameter adjustment, and the adjustment range is the adjustable numerical range of the parameter. This ensures the controllability of the parameter adjustment process and avoids excessive parameter adjustment that could alter the core test objectives of the test cases. After each parameter adjustment, the computing power and memory requirements for the corresponding precision combination test cases are recalculated. The recalculated computing power and memory requirements are compared with the resource quota of the AI ​​card computing power allocation instance until the comparison result meets the set resource constraints. During the parameter adjustment process, the precision combination and operator matching rules of the precision combination test cases remain unchanged to ensure that the core test objectives of the test cases do not deviate.

[0051] Step 3.5. Complete the adaptation marking and output of test cases: In this embodiment, all test cases that have completed matching screening and parameter adjustment are subjected to adaptation marking and classification output. In this embodiment, test cases with precision combinations that cannot be adjusted to meet resource constraints are marked as unsuitable AI card computing power allocation instances. These test cases are stored separately and not included in the subsequent test execution process to avoid test interruptions and runtime anomalies caused by test case resource requirements exceeding the instance resource limit. Test cases that have completed parameter adjustment and meet resource constraints are marked as adapted to the corresponding AI card computing power allocation instances. The adjusted precision combination test cases are used in the subsequent AI card mixed precision mode test process, providing a foundation of adapted test cases for subsequent test execution.

[0052] In some specific implementations, the resource constraint adaptation process for precision combination test cases involves matching, filtering, and adjusting test cases based on the resource quotas of AI card computing power allocation instances. The collected instance resource quotas include: 8 AI Cores, 16GB of GPU memory, and 2MB of cache. Based on these quotas, 120 sets of precision combination test cases were matched and filtered. After the initial matching, 42 test cases matching the instance resource quotas were selected. The remaining 78 mismatched test cases entered the parameter adjustment process. During parameter adjustment, the initial batch size was set to 32, the adjustment step size to 4, and the adjustment range to 4-32. The initial model shard size was set to 12 layers, the adjustment step size to 2 layers, and the adjustment range to 2-12 layers. After each parameter adjustment, the computing power and memory requirements of the test cases are recalculated until the computing power requirement does not exceed 80% of the instance's available computing power and the memory requirement does not exceed 85% of the instance's available memory. This process ultimately completes the parameter adjustment and adaptation for 63 test cases. The remaining 15 test cases that cannot be adjusted to be compatible are marked as incompatible with the instance. This implementation process solves the problem of lacking quantifiable rules for test case parameter adjustment. By using fixed adjustment steps and resource thresholds, it achieves precise adaptation of test cases, avoids test interruptions due to resource overruns, and improves the stability and success rate of the testing process.

[0053] In some embodiments, a dynamic adaptation process can be set up for changes in the resource quota of AI card computing power allocation instances. When the resource quota of an instance changes, the test cases are automatically re-matched and the parameters are adjusted to ensure that the test cases and the resource status of the instance are always adapted.

[0054] Step 4. Perform tests and quantitative evaluation output: Step 4.1. Generate a Dynamic Load Fluctuation Sequence: In this embodiment, a dynamic load fluctuation sequence for testing is generated based on the feature data in the AI ​​workload library. The dynamic load fluctuation sequence is explained here as a time-series data sequence used to simulate the changes in computing load over time during the actual operation of the AI ​​card. It reflects the fluctuation patterns of computing load in AI tasks in real business scenarios. In this embodiment, based on the computing load occupancy curve in the AI ​​workload library, the variation patterns of computing load fluctuations are extracted, including the peak value, valley value, fluctuation period, and rate of change of computing load. Based on the extracted variation patterns, a dynamic load fluctuation sequence is generated. The generated dynamic load fluctuation sequence is consistent with the computing load change characteristics in real business scenarios and can be used to simulate the dynamic load environment in real business scenarios, ensuring the scenario authenticity of subsequent stability tests.

[0055] Step 4.2. Injecting a Dynamic Load Fluctuation Sequence into the Target Instance: In this embodiment, during the execution of the test cases, the generated dynamic load fluctuation sequence is injected into the target AI card computing power partitioning instance. In this embodiment, during the execution of the precision combination test cases, the dynamic load fluctuation sequence is injected into the target AI card computing power partitioning instance at set time intervals. The set time interval is consistent with the sampling interval of the dynamic load fluctuation sequence to ensure the timing accuracy of the load injection process, ensuring that the actual computing power load of the instance is consistent with the set value of the dynamic load fluctuation sequence. The load injection process runs through the entire process of test case execution. From the start of the test case to its completion, dynamic load injection is continuously completed, simulating the continuous load fluctuations that exist during the execution of AI tasks in real business scenarios, and verifying the performance of the AI ​​card hybrid precision mode under dynamic load conditions.

[0056] Step 4.3. Monitor the testing process and collect operational metrics: In this embodiment, the test status is monitored in real time and corresponding operational metrics are collected throughout the entire test case execution process. Specifically, during test case execution, the precision switching status of the AI ​​card's mixed precision mode is monitored in real time. This precision switching status includes the triggering time of the precision switch, the precision type being switched, the time consumed during the switching process, and the deviation of the calculation results before and after the switch, thus comprehensively reflecting the precision switching execution status of the AI ​​card's mixed precision mode.

[0057] Simultaneously, operational metrics are collected during the testing process, covering the entire lifecycle of precision switching, including test case execution time, computational throughput, peak memory usage, computing power utilization, precision loss of calculation results, and number of precision switching anomalies. The collected operational metrics provide basic data support for subsequent quantitative evaluation.

[0058] Step 4.4. Perform parallel testing for a single-card multi-instance scenario: In this embodiment, for a scenario where multiple computing power partitioning instances are deployed on a single card, parallel testing is performed and corresponding performance interference indicators are collected. In this embodiment, for the single-card multi-instance scenario, different AI card mixed precision modes and AI tasks are configured for different AI card computing power partitioning instances, and performance interference indicators between different AI card computing power partitioning instances are collected. During the configuration process, an independent test process is allocated to each AI card computing power partitioning instance, and each test process corresponds to an independent test case and runtime environment, ensuring that the testing processes of different instances are independent of each other, and realistically simulating the business scenario of multiple tenants sharing a single AI card.

[0059] All test processes are started and run synchronously according to the set time nodes, ensuring that the test processes of all instances are executed in parallel within the same time period, which can accurately capture resource competition and performance interference between instances. During the operation, performance interference indicators between instances with different AI card computing power allocation are continuously collected at the set sampling frequency. The performance interference indicators include computing power preemption rate and memory conflict rate. The computing power preemption rate is the proportion of computing power resources occupied by other instances during the instance operation, and the memory conflict rate is the frequency of memory resource access conflicts during the instance operation. The collected performance interference indicators are synchronously incorporated into the operation indicators of the test process for subsequent quantitative evaluation.

[0060] Step 4.5. Quantitative Evaluation Based on a Multi-Dimensional Evaluation System: In this embodiment, based on the collected operational indicators, the test results are quantitatively evaluated using a set multi-dimensional evaluation system. The multi-dimensional evaluation system refers to an indicator system that quantitatively evaluates the performance of the AI ​​card's mixed-precision mode from multiple different dimensions. In this embodiment, the quantitative evaluation is completed based on the set multi-dimensional evaluation system, which includes performance indicators, accuracy indicators, resource indicators, and stability indicators.

[0061] The performance metrics are used to evaluate the computational execution efficiency of the AI ​​card's mixed precision mode, including computational throughput and task execution time; the precision metrics are used to evaluate the computational precision control capability of the AI ​​card's mixed precision mode, including the precision loss rate of computational results and precision switching deviation; the resource metrics are used to evaluate the resource utilization efficiency of the AI ​​card's mixed precision mode, including computing power utilization and video memory utilization; and the stability metrics are used to evaluate the operational stability of the AI ​​card's mixed precision mode under dynamic load and multi-instance parallel scenarios, including the number of precision switching anomalies and the degree of performance interference between instances.

[0062] The multi-dimensional evaluation system sets corresponding calculation rules and data sources for each indicator. The data sources are the operational indicators collected during the test. The numerical calculation of each indicator is completed according to the preset calculation rules. The calculation results of each indicator are used in the subsequent comprehensive score calculation process to ensure that the quantitative evaluation process covers all core dimensions of the test process. The entire quantitative evaluation process follows the set calculation order.

[0063] Step 4.6. Calculate the comprehensive score and output the configuration scheme: In this embodiment, the comprehensive score is calculated based on the quantitative evaluation results, and the corresponding AI card mixed precision mode configuration scheme is selected and output based on the comprehensive score. The weighted summation method used in this embodiment is explained here. The weighted summation method refers to multiplying the calculation result of each indicator item by the corresponding weight coefficient and then summing the results to obtain the comprehensive score. This method can realize the comprehensive quantitative evaluation of multi-dimensional indicators and is a commonly used multi-indicator comprehensive evaluation method in the prior art.

[0064] In this embodiment, based on the evaluation results of the multi-dimensional evaluation system, the weighted summation method is used to calculate the comprehensive score of the AI ​​card's mixed precision mode. Each indicator item of the multi-dimensional evaluation system is assigned a corresponding weight coefficient, and the sum of the weight coefficients is a set value. The allocation of weight coefficients can be adjusted according to the focus of the test to ensure that the comprehensive score can accurately reflect the performance of the AI ​​card's mixed precision mode in the corresponding test scenario.

[0065] The comprehensive score is calculated using a weighted summation method. Based on the ranking results of the comprehensive score, configuration parameters that meet the set requirements are selected. The selected configuration parameters are then summarized to form a configuration scheme for the AI ​​card hybrid precision mode of the corresponding AI card computing power allocation instance. The configuration scheme includes recommended precision combinations, suitable batch size, and load intensity threshold. The output configuration scheme can be directly used to guide the business deployment of AI card hybrid precision mode and provide suitable configuration parameters for the actual application of AI card hybrid precision mode.

[0066] In some specific implementations, for the dynamic stability testing and quantitative evaluation process of AI card hybrid precision mode, a dynamic load fluctuation sequence is generated based on the computing power utilization curve in the AI ​​business load library. The sampling interval of the sequence is set to 10ms, and the sequence duration is consistent with the runtime of the test cases. The load fluctuation range is set to 20%-90% of the instance's available computing power. During the test case execution, this dynamic load fluctuation sequence is injected into the target instance at 10ms intervals. During the test, the sampling frequency of the running indicators is set to 100Hz, covering the entire process of test case execution. For the single-card multi-instance scenario, parallel testing of four computing power partitioned instances is completed. An independent test process is assigned to each instance, and all processes are started synchronously. During the operation, the computing power preemption rate and memory conflict rate between instances are collected at a sampling frequency of 50ms. During the quantitative evaluation process, weighting coefficients of 0.3, 0.3, 0.2, and 0.2 were assigned to performance, accuracy, resource, and stability indicators, respectively, with the sum of these weighting coefficients being 1. A weighted summation method was used to calculate the comprehensive score. Based on the ranking of the comprehensive scores, the three sets of configuration parameters with the highest scores were selected and summarized to form the AI ​​card hybrid accuracy mode configuration scheme for the corresponding instance. This implementation process can solve the problem that static testing cannot reproduce the dynamic scenarios of real business. Through quantitative load fluctuation sequences and high-frequency sampling, the dynamic stability of the AI ​​card hybrid accuracy mode can be accurately verified. At the same time, through multi-dimensional evaluation with fixed weights, the adaptability and rationality of the configuration scheme are ensured.

[0067] In some embodiments, multiple weight coefficient allocation schemes can be set based on different business deployment requirements. The comprehensive score is calculated based on the different weight schemes, and multiple configuration schemes corresponding to different deployment requirements are output to improve the applicability of the configuration schemes.

[0068] This embodiment achieves adaptive testing and configuration optimization of AI card hybrid precision mode through a closed-loop process, effectively improving the problems of scenario rigidity, insufficient coverage, and poor adaptability in existing testing methods. By constructing an AI business load library, this embodiment achieves standardized storage and retrieval of different AI task operation characteristics, providing a foundation for adaptive test case generation. This ensures that test cases are consistent with the characteristics of real business scenarios, narrowing the gap between test scenarios and actual business scenarios. This embodiment replaces the traditional manual enumeration method with adaptive test case generation based on operator precision sensitivity, improving test case generation efficiency and expanding the coverage of test scenarios. Through a resource constraint adaptation process, this embodiment matches test cases with the resource quotas of different computing power allocation instances, reducing the probability of test interruptions due to insufficient resources and improving the stability and success rate of the testing process. Through stability testing under dynamic load and multi-dimensional quantitative evaluation, this embodiment comprehensively verifies the operational performance of AI card hybrid precision mode and outputs adapted configuration schemes, providing implementable parameter support for the actual business deployment of AI card hybrid precision mode, demonstrating strong practicality and adaptability.

[0069] The above description is merely a preferred embodiment of the present invention. It should be understood that the present invention is not limited to the forms disclosed herein and should not be construed as excluding other embodiments. It can be used in various other combinations, modifications, and environments, and can be altered within the scope of the concept described herein through the above teachings or related technologies or knowledge. Modifications and variations made by those skilled in the art that do not depart from the spirit and scope of the present invention should be within the protection scope of the appended claims.

Claims

1. An adaptive test optimization method for AI card hybrid precision mode, characterized in that, Includes the following steps: S1. Construct an AI workload library, classify AI tasks, collect the workload characteristics of AI tasks in the mixed precision mode of AI cards, construct workload templates based on the workload characteristics, and the workload templates include task type, operator ratio, computing power requirement, memory requirement and precision sensitivity. Summarize the workload templates to form the AI ​​workload library. S2. Generate precision combination test cases. Based on the set grading rules, determine the operator's sensitivity level to precision. Based on the operator's sensitivity level, match the corresponding precision type. Combine the AI ​​business workload library to generate precision combination test cases. Precision combination test cases include precision combination, operator matching rules, and expected precision loss rate. S3. Adjust precision combination test cases, collect the resource quota of AI card computing power partitioning instances, calculate the computing power and video memory requirements of AI cards in AI card mixed precision mode, match and filter precision combination test cases, adjust the parameters of precision combination test cases until the resource constraints are met, and mark precision combination test cases that do not meet the resource constraints and cannot be adjusted. S4. Perform AI card mixed precision mode test, generate dynamic load fluctuation sequence based on AI business load library, inject dynamic load fluctuation sequence into AI card computing power partitioning instance, run adjusted precision combination test cases, collect running indicators during the test process, complete quantitative evaluation based on running indicators, and output AI card mixed precision mode configuration scheme.

2. The method according to claim 1, characterized in that, Step S1 includes the following sub-steps: S1.

1. Classify AI tasks into two categories: training tasks and inference tasks; S1.

2. Collect the load characteristics of each type of AI task in the mixed precision mode of the AI ​​card. The collection process covers the entire process of the AI ​​card running in the mixed precision mode. The load characteristics include computing power usage curve, memory fluctuation pattern, operator type distribution and precision sensitivity distribution. S1.

3. Standardize the collected load characteristics. Standardization includes unifying the format of the load characteristics and normalizing the numerical range. Construct a business load template based on the standardized load characteristics. The business load template includes task type, operator proportion, computing power requirement, video memory requirement and precision sensitivity. S1.

4. All business load templates are categorized and stored, and all business load templates are aggregated to form an AI business load library. The AI ​​business load library is used for the subsequent generation of precision combination test cases.

3. The method according to claim 1, characterized in that, Step S2 includes the following sub-steps: S2.

1. Based on the set grading rules, perform pre-tests on the operators, obtain the deviation of the calculation results of the operators under different precision types, and determine the sensitivity level of the operators to precision; S2.

2. Match operators with different sensitivity levels to corresponding precision types, and form a one-to-one matching relationship between precision type and sensitivity level; S2.

3. Combine the business load templates in the AI ​​business load library, extract the operator proportion and precision sensitivity in the business load templates, and generate multiple sets of precision combination test cases for each type of AI task; S2.

4. Configure the precision combination, operator matching rules and expected precision loss rate for each precision combination test case. The precision combination test cases are used in the subsequent resource constraint adaptation process.

4. The method according to claim 1, characterized in that, Step S3 includes the following sub-steps: S3.

1. Collect the resource quota of AI card computing power allocation instance through AI card management tool. The resource quota includes the number of AI Cores, the capacity of video memory and the size of cache. S3.

2. Based on the operator matching rules and precision combinations in the precision combination test cases, calculate the computing power and video memory requirements of the AI ​​card when running the corresponding precision combination test cases in the AI ​​card mixed precision mode; S3.

3. Based on computing power and memory requirements, and combined with the resource quota of AI card computing power allocation instances, match and filter precision combination test cases; S3.

4. For precision combination test cases that do not meet resource constraints, adjust the parameters of the precision combination test cases until the resource constraints are met; S3.

5. For test cases with precision combinations that cannot be adjusted to meet resource constraints, mark them as unsuitable AI card computing power allocation instances. The adjusted precision combination test cases will be used in subsequent AI card mixed precision mode testing processes.

5. The method according to claim 1, characterized in that, Step S4 includes the following sub-steps: S4.

1. Based on the computing power occupancy curve in the AI ​​business load library, extract the changing pattern of computing power fluctuations and generate a dynamic load fluctuation sequence; S4.

2. During the execution of the precision combination test cases, inject a dynamic load fluctuation sequence into the target AI card computing power partitioning instance at set time intervals; S4.

3. Monitor the precision switching status of the AI ​​card's mixed precision mode in real time, collect the operating indicators during the test process, and the operating indicators cover the entire lifecycle data of the precision switching process; S4.

4. Based on the established evaluation system and collected operational indicators, calculate the comprehensive score of the AI ​​card's mixed accuracy mode; S4.

5. Based on the comprehensive score, output the configuration scheme of the AI ​​card mixed precision mode for the corresponding AI card computing power allocation instance. The configuration scheme is used to guide the business deployment of AI card mixed precision mode.

6. The method according to claim 1, characterized in that, In step S2, corresponding precision types are matched for operators with different sensitivity levels. The precision types include FP32, BF16, and FP16. Based on the sensitivity levels divided by the set grading rules, a unique precision type is matched for each sensitivity level. The matching process follows the set mapping rules, which are consistent with the computational characteristics and precision requirements of the operators. The matched precision type forms a fixed binding relationship with the corresponding operator. The fixed binding relationship is synchronously written into the operator matching rules of the corresponding precision combination test case. The operator matching rules are used to constrain the operator precision allocation process during the execution of the precision combination test case, ensuring that the precision allocation process is consistent with the preset matching relationship.

7. The method according to claim 1, characterized in that, In step S3, the parameters of the precision combination test cases are adjusted. The parameters include batch size and model fragment size. The parameter adjustment process follows the set adjustment step size and adjustment range. After each parameter adjustment, the computing power requirements and memory requirements of the corresponding precision combination test cases are recalculated. The recalculated computing power requirements and memory requirements are compared with the resource quota of the AI ​​card computing power partitioning instance until the comparison result meets the set resource constraints. During the parameter adjustment process, the precision combination and operator matching rules of the precision combination test cases remain unchanged to ensure that the core test objectives of the test cases do not deviate.

8. The method according to claim 1, characterized in that, In step S4, for the single-card multi-instance scenario, different AI card mixed precision modes and AI tasks are configured for different AI card computing power partitioning instances. Performance interference indicators between different AI card computing power partitioning instances are collected. During the configuration process, an independent test process is assigned to each AI card computing power partitioning instance. All test processes are started and run synchronously according to the set time nodes. During the operation, performance interference indicators between different AI card computing power partitioning instances are continuously collected according to the set sampling frequency. The performance interference indicators include computing power preemption rate and memory conflict rate. The collected performance interference indicators are synchronously incorporated into the running indicators of the test process for subsequent quantitative evaluation.

9. The method according to claim 1, characterized in that, In step S4, a quantitative evaluation is completed based on the established multi-dimensional evaluation system. The multi-dimensional evaluation system includes performance indicators, accuracy indicators, resource indicators, and stability indicators. The multi-dimensional evaluation system sets corresponding calculation rules and data sources for each indicator. The data sources are the operational indicators collected during the test. The numerical calculation of each indicator is completed according to the preset calculation rules. The calculation results of each indicator are used simultaneously in the subsequent comprehensive score calculation process to ensure that the quantitative evaluation process covers all core dimensions of the test process. The entire process of quantitative evaluation follows the set calculation order.

10. The method according to claim 9, characterized in that, In step S4, based on the evaluation results of the multi-dimensional evaluation system, a weighted summation method is used to calculate the comprehensive score of the AI ​​card hybrid precision mode. Based on the comprehensive score, a configuration scheme for the AI ​​card hybrid precision mode is output. The configuration scheme includes recommended precision combinations, suitable batch size, and load intensity threshold. A corresponding weight coefficient is assigned to each indicator item of the multi-dimensional evaluation system, and the sum of the weight coefficients is a set value. The comprehensive score is calculated using the weighted summation method. Based on the ranking results of the comprehensive score, configuration parameters that meet the set requirements are selected. The selected configuration parameters are summarized to form the configuration scheme for the AI ​​card hybrid precision mode of the corresponding AI card computing power partitioning instance.