A method and electronic device for adjusting operator configuration parameters

By determining the type and performance diagnostic indicators of the operator in the processor and adjusting the configuration parameters of the operator, the problem of complex and inefficient adjustment of operator configuration parameters in the prior art is solved, thereby improving the performance of the processor running the operator.

CN122240181APending Publication Date: 2026-06-19LENOVO (BEIJING) LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
LENOVO (BEIJING) LTD
Filing Date
2026-03-20
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

In the existing technology, the process of adjusting operator configuration parameters is complex and inefficient, and cannot effectively improve the performance of the processor running operators.

Method used

By obtaining processor performance evaluation and diagnostic metrics, the type of operator can be determined as either compute-intensive or memory-intensive. Based on the operator type and performance diagnostic metrics, the configuration parameters of the operator can be adjusted, including compute unit type, compute instruction type, block-level and thread-level parameter layout, pipeline stages, grid size, etc., to optimize resource usage and memory access strategies.

Benefits of technology

This enables more reasonable and efficient adjustment of operator configuration parameters under a given operating scenario, improving the performance of the processor running operators, reducing adjustment complexity, and increasing efficiency.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122240181A_ABST
    Figure CN122240181A_ABST
Patent Text Reader

Abstract

This application discloses a method and electronic device for adjusting operator configuration parameters. The method includes: obtaining at least one performance evaluation index and at least one performance diagnostic index of a processor, wherein the performance evaluation index and the performance diagnostic index are data indicators generated by the processor running operators under a set operating scenario, and the performance diagnostic index is a data indicator that can affect the performance evaluation index; determining the operator type of the operator based on the performance evaluation index, wherein the operator type is either compute-intensive or memory-intensive; determining the parameter adjustment strategy of the operator based on the operator type and the performance diagnostic index; and adjusting the configuration parameters of the operator based on the parameter adjustment strategy.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of artificial intelligence technology, and in particular to a method and electronic device for adjusting operator configuration parameters. Background Technology

[0002] An operator is a basic unit in the computational graph of a model, and the configuration parameters of an operator can affect the performance of the processor when it runs that operator. Therefore, in order to improve the performance of the processor when running operators, it is necessary to configure the operator parameters appropriately. Summary of the Invention

[0003] On the one hand, this application provides a method for adjusting operator configuration parameters, including:

[0004] Obtain at least one performance evaluation metric and at least one performance diagnostic metric for the processor, wherein the performance evaluation metric and the performance diagnostic metric are data metrics generated by the processor running operators under a set operating scenario, and the performance diagnostic metric is a data metric that can affect the performance evaluation metric.

[0005] Based on the performance evaluation metrics, the operator type of the operator is determined, and the operator type is either computationally intensive or memory-intensive.

[0006] Based on the operator type and the performance diagnostic indicators, determine the parameter adjustment strategy for the operator;

[0007] Based on the parameter adjustment strategy, adjust the configuration parameters of the operator.

[0008] In one possible implementation, determining the parameter adjustment strategy for the operator based on the operator type and the performance diagnostic metrics includes:

[0009] Based on the fact that the operator type is computationally intensive, a first set of parameters and a second set of parameters to be adjusted in the configuration parameters of the operator are determined.

[0010] Based on the performance diagnostic indicators, determine the parameter adjustment strategies corresponding to the first parameter set and the second parameter set;

[0011] or,

[0012] Based on the fact that the operator type is memory-intensive, determine the second parameter set and the third parameter set to be adjusted in the configuration parameters of the operator;

[0013] Based on the performance diagnostic indicators, determine the parameter adjustment strategies corresponding to the second parameter set and the third parameter set;

[0014] The first parameter set includes at least one parameter associated with the computational efficiency of the operator; the second parameter set includes at least one parameter associated with both the memory access efficiency and computational efficiency of the operator; and the third parameter set includes at least one parameter associated with the memory access efficiency of the operator.

[0015] In another possible implementation, determining the parameter adjustment strategy corresponding to the first parameter set and the second parameter set based on the performance diagnostic indicators includes:

[0016] Based on the performance diagnostic indicators, the parameter adjustment strategies for each parameter in the first parameter set are determined sequentially according to the order of the parameters in the first parameter set.

[0017] Based on the performance diagnostic indicators, determine the parameter adjustment strategy corresponding to the second parameter set;

[0018] The first parameter set includes, in sequence: computing unit type, computing instruction type, size and layout of sub-blocks at the block level, and size and layout of sub-blocks at the thread level.

[0019] The second set of parameters includes at least one of the following: pipeline stage, mesh size, and block size;

[0020] The third set of parameters includes: global memory access mode, shared memory access mode, vectorized memory access data block size, and secondary memory access method.

[0021] In yet another possible implementation, the running scenario is used to characterize the tensor shape of the input tensor of the operator;

[0022] The step of determining the parameter adjustment strategy for the operator based on the operator type and the performance diagnostic indicators includes:

[0023] Based on the operator type, the performance diagnostic index, and the tensor shape, the parameter adjustment strategy of the operator is determined.

[0024] In one possible implementation, the running scenario is further used to characterize the load state type of the processor when running the operator;

[0025] The method for adjusting operator configuration parameters further includes:

[0026] The adjusted configuration parameters are output as the target configuration parameters that the operator is suitable for configuration under the tensor shape and the load state type.

[0027] In yet another possible implementation, the method for adjusting operator configuration parameters also includes:

[0028] Determine the operator to be run in the model, the input tensor corresponding to the operator, and the tensor shape of the input tensor;

[0029] Determine the current load state type of the processor;

[0030] If no target configuration parameter exists corresponding to the operator, the tensor shape, and the load state type, determine the initial configuration parameter corresponding to the operator;

[0031] The acquisition of at least one performance evaluation metric and at least one performance diagnostic metric for the processor further includes:

[0032] The operator is run in the processor based on the input tensor and the initial configuration parameters;

[0033] Obtain at least one performance evaluation metric and at least one performance diagnostic metric generated by the processor running the operator.

[0034] In yet another possible implementation, the method for adjusting operator configuration parameters also includes:

[0035] If there are target configuration parameters corresponding to the operator, the tensor shape, and the load state type, the operator is run using the processor based on the target configuration parameters and the input tensor.

[0036] In another possible implementation, the method for adjusting operator configuration parameters further includes: determining the operation type corresponding to the operator;

[0037] The step of determining the parameter adjustment strategy for the operator based on the operator type and the performance diagnostic indicators includes:

[0038] Based on the operator type, the performance diagnostic indicators, and the operation type, the parameter adjustment strategy for the operator is determined.

[0039] In yet another possible implementation, the at least one performance evaluation metric includes: the processor’s actual computing bandwidth and actual memory access bandwidth;

[0040] The step of determining the parameter adjustment strategy for the operator based on the operator type and the performance diagnostic indicators includes:

[0041] If it is determined based on the actual computing bandwidth and actual memory access bandwidth that the target condition has not yet been met, then the parameter adjustment strategy for the operator is determined based on the operator type and the performance diagnostic indicators.

[0042] The target conditions include at least one of the following:

[0043] The actual computing bandwidth and the actual memory access bandwidth converge;

[0044] The ratio of the actual computing bandwidth to the processor's rated computing bandwidth is greater than a first threshold.

[0045] The ratio of the actual memory access bandwidth to the processor's rated memory access bandwidth is greater than the second threshold.

[0046] In another aspect, this application also provides an electronic device, including: a controller and a processor;

[0047] The processor is used to run operators;

[0048] The controller is configured to obtain at least one performance evaluation metric and at least one performance diagnostic metric of the processor, wherein the performance evaluation metric and the performance diagnostic metric are data metrics generated by the processor running the operator under a set operating scenario, and the performance diagnostic metric is a data metric that can affect the performance evaluation metric; based on the performance evaluation metric, determine the operator type of the operator, wherein the operator type is either compute-intensive or memory-intensive; based on the operator type and the performance diagnostic metric, determine the parameter adjustment strategy of the operator; and based on the parameter adjustment strategy, adjust the configuration parameters of the operator. Attached Figure Description

[0049] The above and other features, advantages, and aspects of the embodiments of this disclosure will become more apparent when taken in conjunction with the accompanying drawings and the following detailed description. Throughout the drawings, the same or similar reference numerals denote the same or similar elements. It should be understood that the drawings are schematic, and the originals and elements are not necessarily drawn to scale.

[0050] Figure 1 A flowchart illustrating a method for adjusting operator configuration parameters provided in this application;

[0051] Figure 2 Another flowchart illustrating the method for adjusting operator configuration parameters provided in this application;

[0052] Figure 3 Another flowchart illustrating the method for adjusting operator configuration parameters provided in this application;

[0053] Figure 4 Another flowchart illustrating the method for adjusting operator configuration parameters provided in this application;

[0054] Figure 5 Another flowchart illustrating the method for adjusting operator configuration parameters provided in this application;

[0055] Figure 6 A schematic diagram illustrating the implementation principle framework of the method for adjusting operator configuration parameters provided in this application;

[0056] Figure 7 Another flowchart illustrating the method for adjusting operator configuration parameters provided in this application;

[0057] Figure 8 A schematic diagram of the component architecture of the electronic device provided in this application. Detailed Implementation

[0058] The solution proposed in this application can achieve a more reasonable adjustment of the configuration parameters of operators in the model, thereby improving the running performance of the processor running operators.

[0059] The embodiments of this application are described below with reference to the accompanying drawings. The terminology used in the implementation section of this application is only for explaining specific embodiments and is not intended to limit the application. Those skilled in the art will recognize that, with technological advancements and the emergence of new scenarios, the technical solutions provided in the embodiments of this application are also applicable to similar technical problems.

[0060] The terms "first," "second," etc., used in the specification, claims, and accompanying drawings of this application are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such terms are interchangeable where appropriate; this is merely a way of distinguishing objects with the same attributes in the embodiments of this application. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover non-exclusive inclusion, so that a process, method, system, product, or apparatus that comprises a series of elements is not necessarily limited to those elements, but may include other elements not explicitly listed or inherent to those processes, methods, products, or apparatuses.

[0061] like Figure 1 This application illustrates a method for adjusting operator configuration parameters. The method of this embodiment can be applied to electronic devices, such as laptops or desktop computers, or nodes in a host, server, or cloud platform, without limitation.

[0062] The method in this embodiment may include:

[0063] S101, obtain at least one performance evaluation index and at least one performance diagnostic index of the processor.

[0064] In this application, the processor can be any processor running the model, enabling the processor to run operators. For example, the processor can be a graphics processing unit (GPU) or a neural processing unit (NPU), etc., without any specific limitation.

[0065] An operator is a basic computational unit used to perform specific mathematical operations or specific functions. For example, an operator can be a function or a class. Operators are fundamental units in the computational graph that constitutes models such as neural networks; a model's computational graph typically includes multiple operators. In this application, the operator can be applied to any model, such as a large language model, a neural network model, or a multimodal model, without any specific limitations.

[0066] Among them, performance evaluation metrics and performance diagnostic metrics are data metrics generated by the processor running operators under a defined operating scenario. The operating scenario characterizes the scenario conditions satisfied by the processor when running operators. For example, the operating scenario can characterize one or more of the following information: the load state type of the processor when running operators, and the tensor shape of the input tensor; there are no restrictions on this.

[0067] Among them, performance evaluation metrics are used to characterize the running performance of processor operators.

[0068] For example, the performance evaluation metrics obtained in this application may include at least one of the actual computation bandwidth and actual memory access bandwidth when the processor runs operators. Actual computation bandwidth, also known as actual computation rate or actual computation throughput, refers to the number of operations the processor can perform per second. Actual memory access bandwidth refers to the rate at which the processor (or processor core) accesses its memory.

[0069] Performance diagnostic metrics are data indicators that can influence performance evaluation metrics. Performance diagnostic metrics can characterize the performance bottlenecks that exist in the processor's operation.

[0070] For example, performance diagnostic metrics may include those affecting the performance of computing units within the processor, those affecting cache and storage, those related to bandwidth and access, and instruction-level metrics. There are no specific limitations. For ease of understanding, the following table illustrates some of the possible categories and names of performance diagnostic metrics:

[0071]

[0072] The type of computing unit within a processor varies depending on the type and architecture of the processor running the operator in the electronic device. For example, taking a GPU as an example, in one GPU architecture, the computing unit can be a Streaming Multiprocessor (SM), and correspondingly, the computing unit utilization can be the SM utilization.

[0073] The performance diagnostic indicators in the table can affect performance evaluation indicators such as processor computing bandwidth and memory access bandwidth, that is, they can affect the processor's operating performance. Therefore, based on the above performance diagnostic indicators, the performance bottlenecks of the processor when running operators can be reflected.

[0074] Of course, the table above only uses some performance diagnostic indicators of the processor as examples. In actual applications, there may be other possible performance diagnostic indicators.

[0075] Understandably, there are multiple ways to obtain performance diagnostic metrics, without any specific limitations. For example, when the processor starts executing operators, specific metric monitoring tools or programs can be invoked to monitor the processor's operation and obtain its performance diagnostic metrics. For instance, taking a GPU with a certain processor architecture as an example, performance diagnostic metrics can be collected by calling metric collection tools such as Compute Unified Device Architecture (CUDA) Profiling.

[0076] Among them, the performance evaluation indicators can be determined based on some of the performance diagnostic indicators; or they can be calculated based on the computational load and memory access volume (data accessed in memory) of the processor and the total time of the operator, etc., without any specific restrictions.

[0077] S102, Based on performance evaluation indicators, determine the operator type of the operator.

[0078] The operator type is either compute-intensive or memory-intensive. Compute-intensive operators require more computing resources from the processor, while memory-intensive operators require more memory access resources from the processor.

[0079] In this application, there are no restrictions on the specific method for determining the operator type by combining performance evaluation indicators.

[0080] To make it easier to understand, let's take one example:

[0081] Performance evaluation metrics can include the processor's actual computation bandwidth and actual memory access bandwidth. Based on this, the ratio of actual computation bandwidth to the processor's rated computation bandwidth yields the processor's computational utilization when running operators; similarly, the ratio of actual bandwidth to the processor's rated memory access bandwidth yields memory access utilization. If computational utilization is higher than memory access utilization, the operator is considered computationally intensive; conversely, if computational utilization is lower than memory access utilization, the operator is considered memory intensive.

[0082] S103, based on operator type and performance diagnostic indicators, determines the parameter adjustment strategy for the operator.

[0083] The parameter adjustment strategy of the operator is the adjustment strategy for the configuration parameters of the operator. This parameter adjustment strategy may include the parameters that need to be adjusted in the configuration parameters of the operator and the specific adjustment scheme of the parameters (such as the parameter value to be adjusted or the magnitude of the parameter increase).

[0084] As can be understood from the previous introduction, processor performance diagnostic indicators can reflect the performance bottleneck of the processor when running operators, and the operator type of the operator can reflect the degree of demand of the operator for different types of resources. Therefore, combining operator type and performance diagnostic indicators can more reasonably and accurately determine the parameters that need to be adjusted and the parameter adjustment method to improve the performance of the processor running operators, that is, it can more reasonably determine the parameter adjustment strategy.

[0085] S104, adjust the configuration parameters of the operator based on the parameter adjustment strategy.

[0086] The configuration parameters of an operator are used to configure the variable parameters and scheduling strategy when the processor runs the operator. The configuration of an operator can affect the processor's resource usage and the specific execution strategy when running the operator, thereby affecting the processor's resource usage and performance.

[0087] Based on this, the configuration parameters of an operator can include parameters related to limiting the execution form of the processor executing the operator, resource allocation parameters related to resource allocation, memory access strategy parameters related to memory access strategies, and parameters for defining synchronization and concurrency forms, etc., without specific limitations. For example, the configuration parameters of an operator can include, but are not limited to, at least one of the following: computation unit type, computation instruction type, block and thread-related parameters, global memory access mode, shared memory access mode, vectorized memory access data block size, secondary memory access mode, pipeline stage, grid size, and block size.

[0088] Among them, the parameter adjustment strategy can adjust at least some of the configuration parameters of the operator, so that the adjusted configuration parameters are more suitable for the set operating scenario, which can naturally improve the running performance of the processor in the set operating scenario.

[0089] This application's research reveals that, since operators are the basic units for model execution on a processor, the performance of the processor in running operators directly determines the overall inference latency and throughput of the processor running the model. However, there are many possible configuration parameters for operators, and currently, we can only blindly test, that is, run the operator in the processor one by one based on different configuration parameters, in order to find the configuration parameters that make the operator run with the best performance. This is highly complex and inefficient.

[0090] Further research in this application revealed that the execution efficiency of an operator is related to various factors, including the processor architecture, hardware performance, and the input tensor of the operator. Therefore, the computational characteristics of the same operator will differ depending on the operating scenario, leading to a change in the operator type. Different operator types result in different bottlenecks affecting their efficiency. For example, in a large language model, during the Prefill stage, the operator primarily processes long sequences, making it computationally intensive, and its performance is mainly limited by computational power. However, in the Decode stage, the operator's computational task shifts to processing short sequences, making it memory-intensive, and the bottleneck typically lies in memory bandwidth and cache access. Based on this, performance bottlenecks can be identified by combining the operator type and the processor's performance diagnostic metrics for operator execution, allowing for reasonable adjustment of operator configuration parameters.

[0091] As can be seen from the above, this application can obtain at least one performance evaluation index and performance diagnostic index generated by the processor running operators under a specified operating scenario. Based on the performance evaluation index, the specific operator type (compute-intensive or memory-intensive) under the operating scenario can be determined. Furthermore, since both the operator type and the performance diagnostic index of the processor running the operator can reflect the performance bottleneck of the processor running the operator, the parameter adjustment strategy of the operator can be determined more reasonably and accurately based on the operator type and performance diagnostic index. This allows for more efficient and accurate adjustment of the operator's configuration parameters based on the parameter adjustment strategy, and thus more efficient and accurate determination of the suitable configuration parameters for the operator under the specified operating scenario.

[0092] It is understandable that the operation type of an operator determines the specific operation performed by the processor. Therefore, when the operation type of an operator differs, the appropriate computational unit type and other parameters for the processor to use will also differ. Based on this, in order to improve the performance of the processor running operators, the operation type corresponding to the operator can be determined before determining the parameter adjustment strategy.

[0093] The operation type of the operator is used to characterize the type of operation performed by the operator. For example, the operation type of the operator can be matrix multiplication, addition, or other non-operational operation types, without any specific restrictions.

[0094] Accordingly, the parameter adjustment strategy for an operator can be determined based on the operator type, performance diagnostic indicators, and the type of operation of that operator.

[0095] In this application, there are multiple possibilities for the specific implementation of the parameter adjustment strategy, and no specific restrictions are imposed.

[0096] Specifically, considering that different operator types have different resource requirements, the parameters that need to be adjusted in the operator configuration parameters will also differ depending on the operator type. Therefore, the set of parameters to be adjusted in the determined parameter adjustment strategy will also differ for different operator types. The following section will combine... Figure 2 Please provide an explanation, such as Figure 2 This illustration shows another flowchart of the method for adjusting operator configuration parameters provided in this application. The method in this embodiment may include:

[0097] S201, obtain at least one performance evaluation index and at least one performance diagnostic index of the processor.

[0098] Among them, the performance evaluation index and the performance diagnostic index are data indicators generated by the processor running operators under the set operating scenario, and the performance diagnostic index is a data indicator that can affect the performance evaluation index.

[0099] S202, Based on the at least one performance evaluation index, determine the operator type of the operator.

[0100] The operator type is either computationally intensive or memory-intensive.

[0101] S203, based on the operator type being computationally intensive, determine the first set of parameters and the second set of parameters to be adjusted in the operator's configuration parameters.

[0102] The first set of parameters includes at least one parameter related to the computational efficiency of the operator. For example, the type of computing unit used by the processor to run the operator, the type of computing instructions, and the size and layout of sub-tiles at the block level and thread level all affect the computational performance of the operator. Therefore, the first set of parameters may include at least some of the following: computing unit type, computing instruction type, size and layout of sub-tiles at the block level, and size and layout of sub-tiles at the thread level.

[0103] The second parameter set includes at least one parameter that is related to both the memory access efficiency and computational efficiency of the operator. Therefore, the parameters in the second parameter set can be parameters that may affect the runtime performance for operators of any type. For example, the second parameter set includes at least one of the following parameters: pipeline stage, grid size, and block size.

[0104] S204, Based on performance diagnostic indicators, determine the parameter adjustment strategies corresponding to the first parameter set and the second parameter set, and execute step S207.

[0105] Among them, for any one of the first parameter set and the second parameter set, the corresponding parameter adjustment strategy can be to determine the parameter that is currently the performance bottleneck in the corresponding parameter set based on the performance diagnostic indicators, and to determine the target value that the relevant parameter needs to be adjusted to based on the current parameter value of the relevant parameter. The specific implementation is not restricted.

[0106] For example, for each parameter set, the parameter value that each parameter in the parameter set needs to switch to under different parameter values ​​and performance bottlenecks can be pre-configured. Alternatively, performance bottleneck analysis programs or parameter tuning models can be used, combined with performance diagnostic indicators and the current parameter values ​​of each parameter in the parameter set, to determine the parameter tuning strategy, without any specific restrictions.

[0107] The parameter adjustment strategies for the first and second parameter sets can include the parameters that need to be adjusted from these two sets, as well as the target values ​​to which the parameters should be adjusted. Other parameters in the operator's configuration parameters that do not belong to the first or second parameter sets do not need to be adjusted.

[0108] Of course, in this embodiment, the parameter adjustment strategy corresponding to the first parameter set and the second parameter set can also be determined based on the performance diagnostic indicators and the operation type of the operator.

[0109] For the case where the operator is computationally intensive, considering that the adjustment of some parameters in the first parameter set may be affected by other parameters, such as the computation instruction type being related to the configured computation unit type, in one possible scenario, the parameter adjustment strategy for each parameter in the first parameter set can be determined sequentially based on performance diagnostic indicators and in the order of their placement. Then, based on the performance diagnostic indicators, the parameter adjustment strategy for the second parameter set is determined. For example, the first parameter set may sequentially include: computation unit type, computation instruction type, size and layout of sub-blocks at the block level, and size and layout of sub-blocks at the thread level.

[0110] The compute unit type parameter determines the type of compute unit used by the processor to execute operators. The compute instruction type is the type of compute instruction used by the processor to execute operators. The available compute unit types and compute instruction types vary across different processor architectures.

[0111] To facilitate understanding of the parameter adjustment strategy that sequentially adjusts parameters such as the computation unit type and computation instruction type in the first computation unit, we will use a GPU with a certain processor architecture as an example:

[0112] In this type of GPU architecture, the computing unit types can be divided into two categories: Tensor Cores and CUDA Cores. The computing instruction types can be divided into matrix multiply-accumulate (mma) instructions, thread-beam matrix multiply-accumulate (wmma) instructions, and thread-beam group matrix multiply-accumulate (wgmma) instructions, etc.

[0113] If the current computation unit type in the operator's configuration parameters is CUDA Core, and the Tensor Core utilization in the performance diagnostic metrics is lower than the set threshold, and the operator is an operator that performs matrix multiplication or tensor computation, the computation unit type can be changed from CUDA Core to Tensor Core to accelerate computation.

[0114] For computation instruction types, if Tensor Core utilization is low but warp execution efficiency is high, either mma or wmma instructions can be selected. mma instructions are suitable for situations with medium-sized operators and primarily intra-warp matrix computations; wmma instructions are suitable for situations with small-sized operators and regular warp-level matrix computation structures. If performance diagnostic metrics determine that the current number of active warps represents a high percentage of the maximum supported capacity of the computation unit (warpoccupancy), Tensor Core utilization is still insufficient, and the GPU supports warp-group collaborative computation, then the computation instruction type can be set to wgmma instructions.

[0115] Regarding the size and layout of tiles at the block level: If the computation request is much greater than the memory access request (i.e., high computation-to-memory ratio), the tile size at the block level can be increased to increase the amount of computation tasks within each block and improve the utilization of the computation unit; when the computation-to-memory ratio is low or the shared memory usage is high, resulting in a decrease in block concurrency, the parameter adjustment strategy can be to reduce the tile size at the block level to reduce the shared memory requirement of each block, thereby improving concurrency and overall throughput.

[0116] Regarding tile size and layout at the thread level: If high display register usage leads to a decrease in SMoccupancy (the ratio of the number of active warps to the maximum number of active warps supported by the SM), the size of the tile processed by each thread can be reduced to decrease single-thread register requirements and improve warp parallelism and GPU resource utilization. If warp occupancy is high, the computational load per thread can be increased to improve execution efficiency. Simultaneously, for cases of uneven computational load within a warp, thread computation tasks can be rebalanced to achieve an optimal match between thread load and hardware resources, thereby improving the overall operator execution performance while maintaining computational density.

[0117] The parameter adjustment strategy for determining the pipeline stage, grid size, and block size in the second parameter set is similar.

[0118] For example, regarding pipeline stages: if performance diagnostic metrics indicate that memory access wait times are too long (e.g., wait times exceed the set duration), the parameter adjustment strategy can be to increase the number of pipeline stages to improve the overlap between computation and memory access; if too many pipeline stages cause shared memory usage to exceed the threshold, the number of pipeline stages can be reduced to maintain computational parallelism.

[0119] The grid size parameter adjustment strategy is mainly based on the GPU's SM utilization. If the SM utilization is low (e.g., below a set threshold), the grid size can be increased to provide more thread blocks and improve parallelism; when the input size is small or the number of blocks is too large, leading to increased scheduling overhead, the grid size should be decreased to ensure that the computation task for each block is large enough.

[0120] The block size parameter tuning strategy is primarily related to resource utilization. If performance diagnostics indicate a low number of active warps and insufficient register / shared memory usage, tuning strategies may include increasing the block size to improve thread parallelism. If high register or shared memory usage leads to decreased warp activity, tuning strategies may include decreasing the block size to free up resources.

[0121] Of course, the above is just an example of several possible cases for determining the parameters in the first and second parameter sets under a GPU architecture. In practical applications, there may be other possible parameter adjustment strategies for determining each parameter, and there are no restrictions on this.

[0122] S205, based on the operator type being memory-intensive, determine the set of second and third parameters to be adjusted in the operator's configuration parameters.

[0123] The second parameter set includes at least one parameter that is related to both the memory access efficiency and computational efficiency of the operator, as described above.

[0124] The third parameter set includes at least one parameter related to the memory access efficiency of the operator. For example, the third parameter set may include at least some of the following: global memory access mode, shared memory access mode, vectorized memory access block size (i.e., vectorized memory access block size), and secondary memory access mode. By adjusting these parameters in the processor, the memory access mode and the size of the accessed data block when the processor runs the operator can be changed, thereby affecting the efficiency of memory access.

[0125] S206, based on performance diagnostic indicators, determine the parameter adjustment strategies corresponding to the second and third parameter sets.

[0126] The implementation principle of determining the parameter adjustment strategy corresponding to the second parameter set and the third parameter set is similar to the previous step S205. For details, please refer to the previous relevant introduction, which will not be repeated here.

[0127] It is understood that the parameter adjustment strategy determined in step S206 may include the parameters that need to be adjusted in the second parameter set and the third parameter set, as well as the target parameter values ​​that need to be adjusted.

[0128] To facilitate understanding of the implementation principle of the parameter adjustment strategy corresponding to the third parameter set, we will continue to use the GPU architecture mentioned earlier as an example to illustrate the possible scenarios for parameter adjustment strategies corresponding to global memory access mode, shared memory access mode, vectorized memory access data block size, and secondary memory access method in the third parameter set:

[0129] For global memory access patterns, specific adjustment strategies can be determined based on performance diagnostic metrics such as global memory access requests, memory merge rate, and memory bandwidth utilization. For example, if performance diagnostic metrics indicate low memory merge efficiency or frequent global memory transactions, the parameter adjustment strategy could be to adjust the thread-data mapping to allow consecutive threads to access consecutive memory addresses, thereby changing the global memory access pattern, achieving merged memory access, and thus improving global memory bandwidth utilization and reducing memory access latency.

[0130] For shared memory access patterns, optimization strategies can be determined based on shared memory access conflict indicators. For example, if relevant performance diagnostic indicators show a high number of bank conflicts, parameter adjustment strategies can include using swizzle or padding techniques to remap or align shared memory addresses, allowing different threads to access different banks, eliminating access conflicts, and improving shared memory access efficiency and thread parallel execution performance. Swizzle and padding are two core techniques for eliminating shared memory bank conflicts. Their purpose is to rearrange the way data is stored or accessed, so that threads within the same warp can be evenly distributed across different banks when accessing shared memory, thereby avoiding access serialization and improving bandwidth utilization.

[0131] For secondary memory access methods, performance diagnostic metrics such as the number of global memory requests can be used to determine parameter tuning strategies. If the number of global memory requests is large, parameter tuning strategies may include increasing the data block width for each memory access to reduce the number of memory access instructions, thereby reducing global memory access overhead.

[0132] S207, Based on the determined parameter adjustment strategy, adjust the configuration parameters of the operator.

[0133] For example, the parameter adjustment strategy is the parameter adjustment strategy corresponding to the first parameter set and the second parameter set. Based on the parameter adjustment strategy, the parameters in the operator's configuration parameters that belong to the first parameter set and the second parameter set are adjusted to obtain the adjusted configuration parameters corresponding to the operator.

[0134] Correspondingly, if the parameter adjustment strategy is the same as the parameter adjustment strategy corresponding to the second and third parameter sets, it is only necessary to adjust the parameters in the operator's configuration parameters that belong to the second and third parameter sets based on the parameter adjustment strategy to obtain the adjusted configuration parameters.

[0135] In this embodiment, considering that the parameters that can affect the processor's performance when running the operator will also be different when the operator type is different, the set of parameters that need to be adjusted in the operator's configuration parameters will be determined first based on the operator type, and then the parameter adjustment strategy for the corresponding parameter set will be determined in combination with the performance diagnostic indicators. In this way, the parameter adjustment strategy corresponding to the parameters that need to be adjusted can be determined more specifically in combination with the operator type, so as to achieve more reasonable and accurate parameter adjustment.

[0136] It is understandable that the execution efficiency of an operator is related not only to the processor's hardware characteristics but also to the shape of the operator's input tensor. When the shape of the input tensor changes, the operator type may also change, and the suitable configuration parameters for the operator will also change. Therefore, in any of the above embodiments of this application, the defined operating scenario is at least used to characterize the shape of the operator's input tensor. For example, given the defined shape of the operator's input tensor, at least one performance evaluation index and at least one performance diagnostic index generated by the processor running the operator are obtained.

[0137] The input tensor of the operator is the array that needs to be input into the operator. For example, the input tensor may include the data to be processed (such as a sequence) and the weight matrix.

[0138] The tensor shape of an input tensor describes the dimensional information of the input tensor (i.e., the array). The tensor shape describes the size of the input tensor in each dimension, and determines the structure and storage method of the input tensor.

[0139] Based on this, this application can determine the parameter adjustment strategy of the operator based on the operator type, performance diagnostic indicators, and tensor shape.

[0140] For example, when determining the parameter adjustment strategy corresponding to adjusting the size of the vectorized memory access data block based on the operator type, while increasing or decreasing the size of the vectorized memory access data block in conjunction with performance diagnostic indicators, it is also necessary to determine the size of the vectorized memory access data block in conjunction with the tensor shape of the input tensor, so that the size of the vectorized memory access data block is aligned with the tensor shape of the operator's input tensor, so as to ensure that each memory access covers a complete and continuous data sub-block and avoid memory overflow.

[0141] Of course, when determining the parameter adjustment strategy for different configuration parameters of the operator, the specific implementation of the parameter adjustment strategy will also be different when the configuration parameters to be adjusted are different, taking into account the operator type, performance diagnostic indicators and tensor shape. This is just an example of one case, and no restrictions are placed on the specific implementation.

[0142] Furthermore, in order to enable users to know the appropriate configuration parameters of the operator under the tensor shape and to enable the processor to reasonably configure the configuration parameters of the operator when running the operator in the corresponding operating scenario, this application can also output the adjusted configuration parameters as the target configuration parameters of the operator under the tensor shape.

[0143] The target configuration parameter can be output to a display unit or stored in a designated storage area (such as a parameter configuration library). For example, the adjusted configuration parameter can be stored as the target configuration parameter corresponding to the operator and the tensor shape. Alternatively, the adjusted configuration parameter can be stored as the target configuration parameter corresponding to the operator and the tensor shape while simultaneously outputting the target configuration parameter corresponding to the operator and the tensor shape to the display interface.

[0144] In another possible implementation, since the performance of the processor running the same operator will vary depending on the processor's load state type, in this application, the running scenario of the processor running the operator is also used to characterize the load state type of the processor when running the operator. The following is combined with... Figure 3 Please provide an explanation, such as Figure 3 This illustration shows another flowchart of the method for adjusting operator configuration parameters provided in this application. The method in this embodiment may include:

[0145] S301, obtain at least one performance evaluation index and at least one performance diagnostic index of the processor.

[0146] Among them, performance evaluation metrics and performance diagnostic metrics are data metrics generated by the processor running operators under the set operating scenarios.

[0147] This runtime scenario is used to characterize the tensor shape of the operator's input tensor and the type of load state the processor is in.

[0148] Among them, the processor's load state type is used to characterize the processor's resource requirement type.

[0149] For example, processor load states can be categorized into compute-intensive load states and memory-intensive load states. A compute-intensive load state indicates that the processor's current demand for computing resources is higher than its demand for memory access resources. A memory-intensive load state indicates that the processor's current demand for memory resources is higher than its demand for computing resources. For instance, when the processor's load state is compute-intensive, there are relatively many programs or tasks running on the processor that require computation; when there are relatively many tasks or tasks running on the processor that require memory access, the processor is in a memory-intensive load state.

[0150] S302, Based on this performance evaluation index, determine the operator type of the operator.

[0151] The operator type is either computationally intensive or memory-intensive.

[0152] S303, based on the operator type, performance diagnostic indicators, and the shape of the tensor, determines the parameter adjustment strategy for the operator.

[0153] For example, based on performance diagnostic indicators, the parameters that affect the operator's performance and the parameter adjustment strategies under this operator type and tensor shape can be determined. For details, please refer to the relevant introduction above, which will not be repeated here.

[0154] Furthermore, the parameter adjustment strategy for operators can be determined by combining the operator type, performance diagnostic indicators, tensor shape, and operator operation type.

[0155] S304, Adjust the configuration parameters of the operator based on the parameter adjustment strategy.

[0156] S305 outputs the adjusted configuration parameters as the target configuration parameters that the operator is suitable for configuring under this tensor shape and this load state type.

[0157] For example, the adjusted configuration parameters can be output to the display interface as the target configuration parameters suitable for the operator under the tensor shape and the load state type, and / or stored in the parameter configuration library.

[0158] Based on this, if the processor runs the operator again, it can query suitable target configuration parameters based on the load state type when the processor runs the operator and the tensor shape of the operator, and run the operator based on the target configuration parameters to ensure that the operator has high running performance.

[0159] Furthermore, since this application considers not only the tensor shape corresponding to the operator but also the load state type of the processor running the operator when determining the target configuration parameters, after associating and storing the tensor shape and the processor's load state type with the target configuration parameters, in the subsequent model inference stage, the tensor collision of the input tensor corresponding to the operator running by the processor and the current load state type of the processor can be combined to match suitable target configuration parameters, so that the determined target configuration parameters can be more suitable for the current running scenario of the processor running the operator, thereby improving the running performance of the processor running the operator.

[0160] In one possible scenario, to determine the suitable target configuration parameters for the operator under different operating scenarios, the operating scenarios corresponding to the processor and the operator can be configured separately. This allows for testing using the scheme described in this application to obtain the suitable target configuration parameters for each operator under different operating scenarios. For example, with the processor in an idle state, the scheme described in this application can be used to test the suitable target configuration parameters for the operator under different operating scenarios. In this case, at least one of the tensor shape and the processor load state type must be different for each operating scenario.

[0161] The following example uses a specific implementation, combined with... Figure 4 Please explain this situation. For example... Figure 4 This illustration shows another flowchart of the method for adjusting operator configuration parameters provided in this application. The method in this embodiment may include:

[0162] S401, based on the operator to be configured and the set operating scenario, uses the processor to run the operator.

[0163] In this embodiment, the running scenario is used to characterize the tensor shape of the operator's input tensor and the type of load state the processor is in.

[0164] The processor's load state type characterizes its resource requirements. During the testing phase, programs running on the processor can be configured based on the load state type of a defined runtime scenario to place the processor in that load state. For example, if the runtime scenario characterizes the processor as having a compute-intensive load, then at least one compute-intensive task or program (requiring relatively more compute resources) can be run on the processor to place it in a compute-intensive load state. If the runtime scenario characterizes the processor as having a memory-intensive load, then at least one memory-intensive task or program (requiring relatively more memory resources) can be run on the processor to place it in a memory-intensive load state.

[0165] In addition, in order to ensure that the processor runs the operator in accordance with the tensor shape of the input tensor in the set scenario, based on the tensor shape of the input tensor characterized by the set running scenario, this application can obtain a test input tensor with the tensor shape. Based on this test input tensor, the processor runs the operator, thus realizing the operation of the operator by the processor in the set running scenario.

[0166] Specifically, since the processor needs to run the operator based on the operator's configuration parameters, the initial configuration parameters of the operator can also be determined in step S401. Accordingly, the operator is run by the processor based on the initial configuration parameters. On this basis, the operator's configuration parameters can be adjusted subsequently based on the initial configuration parameters.

[0167] S402, obtain at least one performance evaluation metric and at least one performance diagnostic metric generated by the processor running the operator.

[0168] S403, Based on this performance evaluation index, determine the operator type of this operator.

[0169] The operator type is either computationally intensive or memory-intensive.

[0170] S404, based on the operator type being computationally intensive, determine the first set of parameters and the second set of parameters to be adjusted in the configuration parameters of the operator.

[0171] The first parameter set includes at least one parameter related to the computational efficiency of the operator; the second parameter set includes at least one parameter related to both the memory access efficiency and computational efficiency of the operator. For details regarding the first and second parameter sets, please refer to the relevant descriptions in the preceding embodiments, which will not be repeated here.

[0172] S405, based on at least one performance diagnostic metric, the shape of the tensor, and the operation type of the operator, determine the parameter adjustment strategy corresponding to the first parameter set and the second parameter set, and execute step S408.

[0173] S406, based on the operator type being memory-intensive, determine the set of second and third parameters to be adjusted in the operator's configuration parameters.

[0174] S407, based on at least one performance diagnostic metric, the shape of the tensor, and the operation type of the operator, determine the parameter adjustment strategies corresponding to the second parameter set and the third parameter set.

[0175] The third parameter set consists of at least one parameter related to the memory access efficiency of the operator, as detailed in the previous introduction.

[0176] S408, Based on a determined parameter adjustment strategy, adjust the configuration parameters of the operator.

[0177] S409 outputs the adjusted configuration parameters as the target configuration parameters suitable for the operator under tensor shape and load state type.

[0178] For example, the name of the operator, the tensor shape, and the load state type can be associated with the target configuration parameters and stored in the parameter configuration library. This way, when the operator is run again in the processor, after determining the tensor shape of the operator and the load state type of the processor, the appropriate target configuration parameters can be directly queried from the parameter configuration library, so as to reasonably configure the configuration parameters of the operator and improve the running performance of the processor when running the operator.

[0179] It is understandable that, for an operator, by running the operator in different operating scenarios using the scheme of this application, the suitable target configuration parameters of the operator in different operating scenarios can be determined, and the suitable target configuration parameters of the operator under different tensor shapes and processor load state types can be obtained.

[0180] In another possible scenario, during the processor's model execution (i.e., the model inference phase), if the processor reaches a certain operator and cannot find suitable target configuration parameters for that operator, then the processor's current load state type and the tensor shape of the operator's input tensor are determined as a running scenario, and the scheme of this application is used to determine the suitable target configuration parameters for that operator in that running scenario. For this situation, the following section discusses... Figure 5 Please provide an explanation.

[0181] like Figure 5 This illustration shows another flowchart of the method for adjusting operator configuration parameters provided in this application. The method in this embodiment may include:

[0182] S501, determine the operator to be run in the model, the input tensor corresponding to the operator, and the tensor shape of the input tensor.

[0183] The input tensor corresponding to the operator is the input tensor that needs to be processed based on the operator. As mentioned above, the input tensor may include input data and weight data related to the operator in the model.

[0184] S502 determines the current load state type of the processor.

[0185] The processor's load state type is either compute-intensive or memory-intensive.

[0186] During the model inference phase, the current computational utilization and memory utilization of the processor can be determined. If the computational utilization is higher than the memory utilization, it indicates that the task running on the processor has a higher demand for computing resources, and the processor is in a computationally intensive load state. Conversely, if the memory utilization is higher than the computational utilization, the processor is in a memory-intensive load state. The specific methods for determining computational utilization and memory utilization can be found in the previous introduction, and will not be repeated here.

[0187] S503, if there are target configuration parameters corresponding to the operator, the tensor shape and the load state type, the operator is run using the processor based on the target configuration parameters and the input tensor.

[0188] For example, this application can pre-test and store target configuration parameters corresponding to different operators under different tensor shapes and load state types. Based on this, it can query whether there are target configuration parameters corresponding to the operator (such as the name of the operator), the tensor shape of the input tensor corresponding to the operator, and the current load state type of the processor.

[0189] In this embodiment, the target configuration parameters can be matched by combining the operator, the tensor shape corresponding to the operator, and the current load state type of the processor. If the target configuration parameters are matched, it means that configuring the operator with the target configuration parameters under the input tensor and the current load state type of the processor can enable the processor to execute the operator with high efficiency. Therefore, this application can directly use the processor to run the operator to process the input tensor based on the target configuration parameters, thereby improving the processing efficiency of the operator.

[0190] It is understandable that, due to the diverse tensor shapes of the input tensors corresponding to operators under different operating scenarios, it may be impossible to find target configuration parameters that perfectly match the tensor shape of the current input tensor of the operator. In such cases, to ensure efficient processor operation of the operator, and considering that if the tensor shapes of the input tensors are similar when the processor's load state type is the same, then the suitable target configuration parameters for the operator are also likely to be similar. Therefore, this application determines that the target configuration parameters corresponding to the operator, the tensor shape, and the load state type may include at least one of the following:

[0191] If there are target configuration parameters in the parameter configuration library that match the operator, the tensor shape, and the load state type, then the matched target configuration parameters are determined as the target configuration parameters of the operator.

[0192] If there is no target configuration parameter in the parameter configuration library that matches the operator, the tensor shape and the load state type, determine the candidate configuration parameter whose tensor shape and the tensor shape of the operator have a similarity exceeding a set threshold from the candidate configuration parameter set corresponding to the operator and the load state type in the parameter configuration library, and determine the candidate configuration parameter as the target configuration parameter of the operator.

[0193] If there are no candidate configuration parameters in the candidate configuration parameter set whose tensor shape has a similarity to the tensor shape of the operator that exceeds a set threshold, then from the candidate configuration parameter set, a configuration parameter whose at least one dimension feature of the tensor shape matches the corresponding dimension feature in the tensor shape of the operator is determined, and that configuration parameter is determined as the target configuration parameter of the operator.

[0194] Of course, if the target configuration parameters of the operator cannot be determined through the above methods, then it is determined that there are no target configuration parameters corresponding to the operator, the tensor shape corresponding to the operator, and the current load state type of the processor.

[0195] S504, if there are no target configuration parameters corresponding to the operator, the tensor shape, and the load state type, determine the initial configuration parameters corresponding to the operator.

[0196] The initial configuration parameter can be a configuration parameter randomly set for the operator.

[0197] S505, based on the input tensor and the initial configuration parameters, runs the operator in the processor.

[0198] It is understandable that, since the shape of the input tensor is fixed and the current load state of the processor is also fixed, the operation of the operator in the processor based on the input tensor is actually in the inference phase. The tensor shape of the input tensor actually obtained by the processor and the current load state of the processor are used as a running scenario to run the operator.

[0199] S506, obtain at least one performance evaluation metric and at least one performance diagnostic metric generated by the processor running the operator.

[0200] S507, Based on this performance evaluation index, determine the operator type of the operator.

[0201] The operator type is either computationally intensive or memory-intensive.

[0202] S508, based on the operator type and the performance diagnostic index, determine the parameter adjustment strategy for the operator.

[0203] S509, adjust the configuration parameters of the operator based on the parameter adjustment strategy.

[0204] For example, the configuration parameters of the operator can be determined based on the parameter adjustment strategy.

[0205] For example, based on the initial configuration parameters, the configuration parameters of the operator can be adjusted. In this case, the initial configuration parameters can be adjusted at least once based on the parameter adjustment strategy. Each time the configuration parameters of the operator are adjusted, the operator can be run in the processor based on the adjusted configuration parameters and the input tensor, and steps S506 to S509 can be executed until the set number of iterations is reached or the performance of the operator is determined to meet the requirements.

[0206] The above steps S506 to S507 can be found in the descriptions of other embodiments, and will not be repeated here.

[0207] It is understandable that in this embodiment, if the target configuration parameters corresponding to the operator, the tensor shape of the operator's input tensor, and the current load state type of the processor cannot be found during the processor's execution of model inference, then the tensor shape of the input tensor and the load state type of the processor will be taken as a running scenario. The configuration parameters of the operator will be adjusted by combining the performance evaluation indicators and performance diagnostic indicators of the processor running the operator in this running scenario, so as to obtain the suitable configuration parameters of the operator in this running scenario. This can provide suitable configuration parameters for running the operator in the same running scenario in the future.

[0208] It is understandable that after step S509, this embodiment can also output the adjusted configuration parameters as target configuration parameters suitable for the operator under the tensor shape and the current load state type. For example, the adjusted configuration parameters can be stored in the parameter configuration library as the target configuration parameters corresponding to the operator, the tensor shape, and the load state type to update the parameter configuration library.

[0209] To facilitate understanding of the benefits of this application, the following is combined with... Figure 6 To explain, Figure 6 A block diagram illustrating one implementation principle of the scheme in this application is shown.

[0210] Depend on Figure 6 It can be seen that, under the set tensor shape of the input tensor and the processor load state type, the performance index acquisition module can obtain at least one performance diagnostic index generated by the processor's running operator and determine at least one performance evaluation index.

[0211] The operator tuning and parameter configuration module can determine the operator type using at least one performance evaluation metric. Based on the operator type and at least one performance diagnostic metric, it determines the operator's parameter adjustment strategy, adjusts the operator's configuration parameters according to the parameter adjustment strategy, and obtains the target configuration parameters. Furthermore, the determined target configuration parameters can be associated with the operator's identifier (such as operator name), tensor shape, and load state type and stored in the parameter configuration library.

[0212] When running operators in the model, the monitoring and parameter tuning module can query the parameter configuration library for matching target configuration parameters in the current running scenario based on the operator identifier of the operator being run by the processor, the tensor shape of the corresponding input tensor, and the current load state type of the processor. If no target configuration parameter that perfectly matches the current running scenario is found, the module can also query the candidate configuration parameter set corresponding to the operator identifier and load type to find target configuration parameters with tensor shapes similar to those of the operator, so as to configure the operator's configuration parameters appropriately. If a target configuration parameter is found, the processor can run the operator based on that target configuration parameter to ensure the operator's execution performance.

[0213] If no target configuration parameters matching or similar to the running scenario are found in the parameter configuration library, the processor can also run operators in the running scenario and determine the parameter adjustment strategy of the operators based on the performance evaluation indicators and performance diagnostic indicators generated by the running operators. The processor can then adjust the configuration parameters of the operators based on the parameter adjustment strategy to obtain suitable target configuration parameters for the running scenario and store them in the parameter configuration library. This will enable dynamic updates to the parameter configuration library and continuous improvement of the parameter configuration library.

[0214] It is understood that, in any of the above embodiments of this application, the purpose of adjusting the operator's configuration parameters is to ensure that the processor's performance in running the operator meets the target requirements, such as optimal performance. Therefore, if the performance evaluation metrics determine that the operator's performance already meets the target requirements, there is no need to adjust the operator's configuration parameters.

[0215] Therefore, in any of the above embodiments of this application, the at least one performance evaluation metric includes: the actual computing bandwidth and the actual memory access bandwidth of the processor. Accordingly, if it is determined based on the actual computing bandwidth and the actual memory access bandwidth that the target condition has not been met, then it is necessary to determine the parameter adjustment strategy of the operator based on the operator type and the performance diagnostic metric, so as to adjust the configuration parameters of the operator based on the parameter adjustment strategy.

[0216] The conditions for achieving the target conditions for actual computing bandwidth and actual memory access bandwidth may include at least one of the following:

[0217] The actual computation bandwidth and the actual memory access bandwidth converged.

[0218] The ratio of the actual computing bandwidth to the processor's rated computing bandwidth is greater than a first threshold.

[0219] The ratio of the actual memory access bandwidth to the processor's rated memory access bandwidth is greater than the second threshold.

[0220] Among them, the convergence of actual computing bandwidth and actual memory access bandwidth means that, in the process of continuously iterating and adjusting the configuration parameters of the operator, the currently determined actual computing bandwidth and actual memory access bandwidth no longer change or the change is less than the set threshold, relative to the most recent determined actual computing bandwidth and actual memory access bandwidth.

[0221] To facilitate understanding, the following example illustrates a possible implementation method by determining the appropriate configuration parameters for an operator through testing in a specific operational scenario. Figure 7 This illustration shows another implementation flowchart of the method for adjusting operator configuration parameters provided in this application. The method in this embodiment may include:

[0222] S701, obtain the operator to be configured, the set initial configuration parameters, the input tensor corresponding to the operator, and the tensor shape of the input tensor.

[0223] S702, when the processor is in a set load state type, uses the initial configuration parameters as the configuration parameters of the operator, and runs the operator using the processor based on the input tensor.

[0224] The load state type set can be either a compute-intensive load state or a memory-intensive load state. In this embodiment, the processor can be placed in a set load state type by controlling the program or task running on the processor, as described above, and will not be repeated here.

[0225] In this embodiment, the running scenario is the processor being in the set load state type, and the tensor shape is the tensor shape of the obtained input tensor.

[0226] S703, obtain at least one performance evaluation index and at least one performance diagnostic index generated by the processor running the operator.

[0227] Among them, the at least one performance evaluation metric includes: the processor's actual computing bandwidth and actual memory access bandwidth.

[0228] S704. Determine whether the target condition has been met based on the actual computing bandwidth and actual memory access bandwidth. If not, proceed to step S705; if yes, proceed to step S709.

[0229] The target condition includes at least one of the following:

[0230] The actual computation bandwidth and the actual memory access bandwidth converged.

[0231] The ratio of the actual computing bandwidth to the processor's rated computing bandwidth is greater than a first threshold.

[0232] The ratio of the actual memory access bandwidth to the processor's rated memory access bandwidth is greater than the second threshold.

[0233] Of course, each time the operator's configuration parameters are adjusted, the iteration count can be incremented by one. The initial value of the iteration count is 0. Furthermore, the objective condition can also be that the number of iterations reaches a set number.

[0234] S705 determines the operator type of the operator based on the processor's actual computing bandwidth and actual memory access bandwidth.

[0235] The operator type is either computationally intensive or memory-intensive.

[0236] S706, Based on the operator type, the operator operation type, the tensor shape of the input tensor, and at least one performance diagnostic index, determine the parameter adjustment strategy of the operator.

[0237] For example, based on the operator type being computationally intensive, a first set of parameters and a second set of parameters to be adjusted in the operator's configuration parameters are determined. Based on the operator's operation type, the tensor shape, and at least one performance diagnostic metric, the parameter adjustment strategies corresponding to the first set of parameters and the second set of parameters are determined.

[0238] For example, based on the operator type being memory-intensive, the second and third parameter sets to be adjusted in the operator's configuration parameters are determined. Based on the operator's operation type, tensor shape, and at least one performance diagnostic metric, the parameter adjustment strategies corresponding to the second and third parameter sets are determined.

[0239] The specific implementation of this step can be found in the description of any of the previous embodiments, and will not be repeated here.

[0240] S707, based on a determined parameter adjustment strategy, adjusts the configuration parameters of the operator.

[0241] S708, based on the adjusted configuration parameters and the input tensor, the processor runs the operator and returns to the execution step S703.

[0242] S709, store the current configuration parameters of the operator as the target configuration parameters that the operator is suitable for configuration under tensor shape and load state type.

[0243] Furthermore, this application also provides an electronic device in its embodiments. For example... Figure 8 As shown, a schematic diagram of the composition structure of the electronic device is presented, which includes at least a controller 801 and at least one processor 802.

[0244] The processor 801 is used to run the operators.

[0245] The controller 802 is configured to obtain at least one performance evaluation metric and at least one performance diagnostic metric of the processor, wherein the performance evaluation metric and the performance diagnostic metric are data metrics generated by the processor running the operator under a set operating scenario, and the performance diagnostic metric is a data metric that can affect the performance evaluation metric; based on the performance evaluation metric, determine the operator type of the operator, wherein the operator type is either compute-intensive or memory-intensive; based on the operator type and the performance diagnostic metric, determine the parameter adjustment strategy of the operator; and based on the parameter adjustment strategy, adjust the configuration parameters of the operator.

[0246] For example, the processor can be a GPU or an NPU. The controller can be a CPU or other controllers that can manage processors such as GPUs and NPUs.

[0247] In this application, the specific operations performed by the processor and controller can be found in the relevant descriptions of the preceding embodiments, and will not be repeated here.

[0248] Furthermore, the electronic device may also include a memory 803 and a display unit 804. The memory 803 is used to store the programs required for the processor to perform operations. The display unit 804 is used to output the adjusted configuration parameters as target configuration parameters corresponding to the operator and the operating scenario.

[0249] Of course, the electronic device can also have more than Figure 8 There are no restrictions on the number of components, whether more or fewer.

[0250] This application also provides a computer program product, including computer-readable instructions, which, when executed on an electronic device, cause the electronic device to implement any of the methods for adjusting operator configuration parameters provided in this application.

[0251] This application also provides a computer-readable storage medium carrying one or more computer programs. When the one or more computer programs are executed by an electronic device, the electronic device can implement any of the methods for adjusting operator configuration parameters provided in this application.

[0252] It should also be noted that the device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate, and the components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. In addition, in the device embodiment drawings provided in this application, the connection relationship between modules indicates that they have a communication connection, which can be implemented as one or more communication buses or signal lines.

[0253] Through the above description of the embodiments, those skilled in the art can clearly understand that this application can be implemented by means of software plus necessary general-purpose hardware, or it can be implemented by special-purpose hardware including application-specific integrated circuits, special-purpose CPUs, special-purpose memory, special-purpose components, etc. Generally, any function performed by a computer program can be easily implemented by corresponding hardware, and the specific hardware structure used to implement the same function can also be diverse, such as analog circuits, digital circuits, or special-purpose circuits. However, for this application, software program implementation is more often the preferred implementation method. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product is stored in a readable storage medium, such as a computer floppy disk, USB flash drive, mobile hard disk, ROM, RAM, magnetic disk, or optical disk, etc., and includes several instructions to cause a computer device (which may be a personal computer, training equipment, or network device, etc.) to execute the methods described in the various embodiments of this application.

[0254] In the above embodiments, implementation can be achieved, in whole or in part, through software, hardware, firmware, or any combination thereof. When implemented in software, it can be implemented, in whole or in part, as a computer program product.

[0255] The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of this application are generated. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another. For example, the computer instructions may be transmitted from one website, computer, training device, or data center to another website, computer, training device, or data center via wired (e.g., coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium may be any available medium that a computer can store or a data storage device such as a training device or data center that integrates one or more available media. The available media may be magnetic media (e.g., floppy disks, hard disks, magnetic tapes), optical media (e.g., DVDs), or semiconductor media (e.g., solid-state drives (SSDs)).

Claims

1. A method for adjusting operator configuration parameters, comprising: Obtain at least one performance evaluation metric and at least one performance diagnostic metric for the processor, wherein the performance evaluation metric and the performance diagnostic metric are data metrics generated by the processor running operators under a set operating scenario, and the performance diagnostic metric is a data metric that can affect the performance evaluation metric. Based on the performance evaluation metrics, the operator type of the operator is determined, and the operator type is either computationally intensive or memory-intensive. Based on the operator type and the performance diagnostic indicators, determine the parameter adjustment strategy for the operator; Based on the parameter adjustment strategy, adjust the configuration parameters of the operator.

2. The method for adjusting operator configuration parameters according to claim 1, wherein determining the operator parameter adjustment strategy based on the operator type and the performance diagnostic indicators includes: Based on the fact that the operator type is computationally intensive, a first set of parameters and a second set of parameters to be adjusted in the configuration parameters of the operator are determined. Based on the performance diagnostic indicators, determine the parameter adjustment strategies corresponding to the first parameter set and the second parameter set; or, Based on the fact that the operator type is memory-intensive, determine the second parameter set and the third parameter set to be adjusted in the configuration parameters of the operator; Based on the performance diagnostic indicators, determine the parameter adjustment strategies corresponding to the second parameter set and the third parameter set; The first parameter set includes at least one parameter associated with the computational efficiency of the operator; the second parameter set includes at least one parameter associated with both the memory access efficiency and computational efficiency of the operator; and the third parameter set includes at least one parameter associated with the memory access efficiency of the operator.

3. The method for adjusting operator configuration parameters according to claim 2, wherein determining the parameter adjustment strategy corresponding to the first parameter set and the second parameter set based on the performance diagnostic indicators includes: Based on the performance diagnostic indicators, the parameter adjustment strategies for each parameter in the first parameter set are determined sequentially according to the order of the parameters in the first parameter set. Based on the performance diagnostic indicators, determine the parameter adjustment strategy corresponding to the second parameter set; The first parameter set includes, in sequence: computing unit type, computing instruction type, size and layout of sub-blocks at the block level, and size and layout of sub-blocks at the thread level. The second set of parameters includes at least one of the following: pipeline stage, mesh size, and block size; The third set of parameters includes: global memory access mode, shared memory access mode, vectorized memory access data block size, and secondary memory access method.

4. The method for adjusting operator configuration parameters according to claim 1, wherein the running scenario is used to characterize the tensor shape of the input tensor of the operator; The step of determining the parameter adjustment strategy for the operator based on the operator type and the performance diagnostic indicators includes: Based on the operator type, the performance diagnostic index, and the tensor shape, the parameter adjustment strategy of the operator is determined.

5. The method for adjusting operator configuration parameters according to claim 4, wherein the running scenario is further used to characterize the load state type of the processor when running the operator; The method for adjusting operator configuration parameters further includes: The adjusted configuration parameters are output as the target configuration parameters that the operator is suitable for configuration under the tensor shape and the load state type.

6. The method for adjusting operator configuration parameters according to claim 5 further includes: Determine the operator to be run in the model, the input tensor corresponding to the operator, and the tensor shape of the input tensor; Determine the current load state type of the processor; If no target configuration parameter exists corresponding to the operator, the tensor shape, and the load state type, determine the initial configuration parameter corresponding to the operator; The acquisition of at least one performance evaluation metric and at least one performance diagnostic metric for the processor further includes: The operator is run in the processor based on the input tensor and the initial configuration parameters; Obtain at least one performance evaluation metric and at least one performance diagnostic metric generated by the processor running the operator.

7. The method for adjusting operator configuration parameters according to claim 6 further includes: If there are target configuration parameters corresponding to the operator, the tensor shape, and the load state type, the operator is run using the processor based on the target configuration parameters and the input tensor.

8. The method of adjusting operator configuration parameters of claim 1, further comprising: Determine the operation type corresponding to the operator; The step of determining the parameter adjustment strategy for the operator based on the operator type and the performance diagnostic indicators includes: Based on the operator type, the performance diagnostic indicators, and the operation type, the parameter adjustment strategy for the operator is determined.

9. The method for adjusting operator configuration parameters according to claim 1, wherein the at least one performance evaluation index includes: The processor's actual computing bandwidth and actual memory access bandwidth; The step of determining the parameter adjustment strategy for the operator based on the operator type and the performance diagnostic indicators includes: If it is determined based on the actual computing bandwidth and actual memory access bandwidth that the target condition has not yet been met, then the parameter adjustment strategy for the operator is determined based on the operator type and the performance diagnostic indicators. The target conditions include at least one of the following: The actual computing bandwidth and the actual memory access bandwidth converge; The ratio of the actual computing bandwidth to the processor's rated computing bandwidth is greater than a first threshold. The ratio of the actual memory access bandwidth to the processor's rated memory access bandwidth is greater than the second threshold.

10. An electronic device, comprising: Controllers and processors; The processor is used to run operators; The controller is configured to obtain at least one performance evaluation index and at least one performance diagnostic index of the processor, wherein the performance evaluation index and the performance diagnostic index are data indicators generated by the processor running the operator under a set operating scenario, and the performance diagnostic index is a data indicator that can affect the performance evaluation index. Based on the performance evaluation metrics, the operator type of the operator is determined, which is either compute-intensive or memory-intensive; based on the operator type and the performance diagnostic metrics, the parameter adjustment strategy of the operator is determined; based on the parameter adjustment strategy, the configuration parameters of the operator are adjusted.