Operator tuning methods, devices, electronic equipment and storage media
By testing and selecting the optimal operator configuration, the problem of frequent configuration caused by dynamic data size changes was solved, thus improving the efficiency and performance of neural network operations.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SHANGHAI CAMBRICON INFORMATION TECH CO LTD
- Filing Date
- 2021-12-07
- Publication Date
- 2026-06-30
AI Technical Summary
In neural network operations, the dynamically changing size of input data necessitates frequent reconfiguration of operators, and there is a lack of an effective configuration method to optimize computational performance.
By acquiring various operator configuration methods, testing their computational performance, selecting the target configuration method with the highest computational performance, generating test data using sampling methods, and optimizing the operator configuration parameters to adapt to different data sizes.
It achieves high-efficiency operation performance of operators under different data sizes, optimizes the utilization of computing resources, and improves the efficiency of neural network operations.
Smart Images

Figure CN116244059B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of artificial intelligence technology, specifically to an operator tuning method, apparatus, electronic device, and storage medium. Background Technology
[0002] Currently, in neural network operations, users typically configure operators based on hardware resources such as memory and computing power. This allows for the rational allocation of computing resources and memory to the operator, thereby accelerating the computation process while fully utilizing hardware performance. Operator configuration includes, but is not limited to, determining the following characteristics: input data size and storage address, output data size and storage space, input data partitioning strategy and parallelism, and operator partitioning strategy and parallelism. For example, when the on-chip memory cannot meet the storage requirements of the operator, the operator can be divided into blocks. On-chip memory is then allocated to each block, and the input data for each block is acquired. Operations are then performed on each block separately to obtain the operator's result. In this case, the operator configuration process can include: operator partitioning strategy, input data partitioning strategy, and data storage address allocation.
[0003] However, the input data for some operators is dynamically changing. For example, the size of the input data may be different in two separate instances. This dynamic change in the size of the input data means that the operator needs to be configured repeatedly for the same operator. Therefore, finding an optimal way to configure the operator for all possible data sizes is a problem that urgently needs to be solved. Summary of the Invention
[0004] This application provides an operator tuning method, apparatus, electronic device, and storage medium to find a better operator configuration for operators with varying input data sizes.
[0005] In a first aspect, embodiments of this application provide an operator tuning method, including:
[0006] Obtain at least one operator configuration method corresponding to the operator, wherein each operator configuration method is determined by a set of configuration parameters;
[0007] By executing the operator obtained in each of the operator configuration methods, the first operational performance of the operator under each of the operator configuration methods is obtained;
[0008] Based on the first operational performance of the operator under each of the operator configuration methods, the target operator configuration method is determined among the at least one operator configuration methods.
[0009] In one embodiment of this application, determining the target operator configuration method among the at least one operator configuration methods based on the first operational performance of the operator under each of the operator configuration methods includes:
[0010] Based on the first operational performance of the operator under each operator configuration, k candidate operator configurations are selected from the at least one operator configuration, where k is a positive integer greater than 1 and k is less than the total number of operator configurations;
[0011] By executing the operator obtained in each of the candidate operator configuration methods, the second operational performance of the operator under each of the candidate operator configuration methods is obtained;
[0012] Based on the second operational performance of the operator under each of the candidate operator configuration methods, the target operator configuration method is determined, and the target operator configuration method is the candidate operator configuration method with the highest second operational performance among the k candidate operator configuration methods.
[0013] In one embodiment of this application, obtaining the first operational performance of the operator under each of the operator configuration methods by executing the operator obtained by each of the operator configuration methods includes:
[0014] Using the first test data as input data for the operator, the operator obtained under each operator configuration method is executed to obtain the first operational performance of the operator under each operator configuration method.
[0015] In one embodiment of this application, obtaining the second operational performance of the operator under each of the candidate operator configuration methods by executing the operator obtained by each of the candidate operator configuration methods includes:
[0016] For each candidate operator configuration, multiple second test data are used as input data for the operator. By executing the operator, the second operational performance of the operator under the candidate operator configuration is obtained, wherein the data size of the multiple test data is within the data size range allowed by the operator.
[0017] In one embodiment of this application, the step of using multiple second test data as input data for the operator, executing the operator obtained by each of the candidate operator configuration methods, and obtaining the second operational performance of the operator under each of the candidate operator configuration methods includes:
[0018] Each of the second test data is used as the input data of the operator, and the operator obtained by each of the candidate operator configuration methods is executed. The third operation performance of the operator under each of the candidate operator configuration methods is obtained when the operator is tested using each of the second test data.
[0019] Based on the multiple third operational performances, the second operational performance of the operator under each of the candidate operator configuration methods is obtained.
[0020] In one embodiment of this application, prior to the second operational performance of the operator under each of the candidate operator configurations, the method further includes:
[0021] The target sampling method is used to sample within the data size allowed by the operator to obtain the multiple sampled data sizes;
[0022] The plurality of second test data are generated based on the plurality of sampled data sizes.
[0023] In one embodiment of this application, before sampling within the maximum data size allowed by the operator using the target sampling method, the method further includes:
[0024] Select the target sampling method from a variety of preset sampling methods.
[0025] In one embodiment of this application, the method further includes:
[0026] Configure the operator using the target operator configuration method.
[0027] In one embodiment of this application, the first operational performance of the operator is the first operational time of the operator and / or the first throughput of the operator;
[0028] The second operational performance of the operator is the second operational time and / or the second throughput of the operator.
[0029] Secondly, embodiments of this application provide an operator tuning apparatus, comprising: an acquisition unit and a processing unit; the acquisition unit is configured to acquire at least one operator configuration method corresponding to an operator, wherein each operator configuration method is determined by a set of configuration parameters; the processing unit is configured to obtain a first operational performance of the operator under each operator configuration method by executing the operator obtained under each operator configuration method; and determine a target operator configuration method among the at least one operator configuration method based on the first operational performance of the operator under each operator configuration method.
[0030] In some possible implementations, the processing unit is specifically configured to: determine the target operator configuration in the at least one operator configuration based on the first operational performance of the operator in each of the operator configurations;
[0031] Based on the first operational performance of the operator under each operator configuration, k candidate operator configurations are selected from the at least one operator configuration, where k is a positive integer greater than 1 and k is less than the total number of operator configurations;
[0032] By executing the operator obtained in each of the candidate operator configuration methods, the second operational performance of the operator under each of the candidate operator configuration methods is obtained;
[0033] Based on the second operational performance of the operator under each of the candidate operator configuration methods, the target operator configuration method is determined, and the target operator configuration method is the candidate operator configuration method with the highest second operational performance among the k candidate operator configuration methods.
[0034] In some possible implementations, the processing unit is specifically configured to: obtain the first operational performance of the operator under each of the operator configurations by executing the operator obtained by each of the operator configurations;
[0035] Using the first test data as input data for the operator, the operator obtained under each operator configuration method is executed to obtain the first operational performance of the operator under each operator configuration method.
[0036] In some possible implementations, regarding obtaining the second operational performance of the operator under each of the candidate operator configurations by executing the operator obtained by each of the candidate operator configurations, the processing unit is specifically used for:
[0037] For each candidate operator configuration, multiple second test data are used as input data for the operator. By executing the operator, the second operational performance of the operator under the candidate operator configuration is obtained, wherein the data size of the multiple test data is within the data size range allowed by the operator.
[0038] In one embodiment of this application, in order to obtain the second operational performance of the operator under each candidate operator configuration by using multiple second test data as input data for the operator, and executing the operator obtained by each candidate operator configuration, the processing unit is specifically used for:
[0039] Each of the second test data is used as the input data of the operator, and the operator obtained by each of the candidate operator configuration methods is executed. The third operation performance of the operator under each of the candidate operator configuration methods is obtained when the operator is tested using each of the second test data.
[0040] Based on the multiple third operational performances, the second operational performance of the operator under each of the candidate operator configuration methods is obtained.
[0041] In one embodiment of this application, before obtaining the second operational performance of the operator under each of the candidate operator configurations by executing the operator obtained by each of the candidate operator configurations, the processing unit is further configured to:
[0042] The operator is sampled within its allowed data size range using a target sampling method to obtain the plurality of sampled data sizes;
[0043] The plurality of second test data are generated based on the plurality of sampled data sizes.
[0044] In one embodiment of this application, before sampling within the maximum data size allowed by the operator using the target sampling method, the processing unit is further configured to:
[0045] Select the target sampling method from a variety of preset sampling methods.
[0046] In one embodiment of this application, the processing unit is further configured to:
[0047] Configure the operator using the target operator configuration method.
[0048] In one embodiment of this application, the first operational performance of the operator is the first operational time of the operator and / or the first throughput of the operator;
[0049] The second operational performance of the operator is the second operational time and / or the second throughput of the operator.
[0050] Thirdly, embodiments of this application provide an electronic device, including: a processor connected to a memory for storing a computer program, and the processor for executing the computer program stored in the memory to cause the electronic device to perform the method as described in the first aspect.
[0051] Fourthly, embodiments of this application provide a computer-readable storage medium storing a computer program that causes a computer to perform the method described in the first aspect.
[0052] Fifthly, embodiments of this application provide a computer program product, the computer program product including a non-transitory computer-readable storage medium storing a computer program, the computer being operable to perform the method as described in the first aspect. Attached Figure Description
[0053] To more clearly illustrate the technical solutions in the embodiments of this application, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0054] Figure 1 A flowchart illustrating an operator tuning method provided in an embodiment of this application;
[0055] Figure 2 A flowchart illustrating a method for selecting a target operator configuration is provided in an embodiment of this application;
[0056] Figure 3 A flowchart illustrating an operator tuning method provided in an embodiment of this application;
[0057] Figure 4 A functional unit block diagram of an operator tuning device provided in an embodiment of this application;
[0058] Figure 5 This is a schematic diagram of the structure of an electronic device provided in an embodiment of this application. Detailed Implementation
[0059] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.
[0060] The terms "first," "second," "third," and "fourth," etc., used in the specification, claims, and accompanying drawings of this application are used to distinguish different objects, not to describe a specific order. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover non-exclusive inclusion. For example, a process, method, system, product, or apparatus that includes a series of steps or units is not limited to the listed steps or units, but may optionally include steps or units not listed, or may optionally include other steps or units inherent to these processes, methods, products, or apparatuses.
[0061] In this document, the term "embodiment" means that a particular feature, result, or characteristic described in connection with an embodiment may be included in at least one embodiment of this application. The appearance of this phrase in various places throughout the specification does not necessarily refer to the same embodiment, nor is it a separate or alternative embodiment mutually exclusive with other embodiments. It will be explicitly and implicitly understood by those skilled in the art that the embodiments described herein can be combined with other embodiments.
[0062] First, it should be noted that an operator is a mapping from one function space to another. In the field of neural networks, this operator includes, but is not limited to, convolution operators, activation operators, and pooling operators. This operator can also include other operators used for the four arithmetic operations (addition, subtraction, multiplication, and division), etc. This is only an example and does not specifically limit the type of operator.
[0063] For computations involving large datasets and complex processes, such as machine learning, dedicated processors, such as AI processors, graphics processing units (GPUs), or other devices, are typically used to implement machine learning operations. These operations include neural network operations, k-means operations, and support vector machine operations. Generally, a general-purpose processor (such as a CPU) is connected to this dedicated processor to form a heterogeneous computer system. In heterogeneous computing scenarios, to facilitate the development of operators on dedicated processors, a machine learning library can be provided to accelerate various machine learning or deep learning algorithms on AI processors. This machine learning library provides a set of efficient, general-purpose, flexible, and scalable programming interfaces. Upper-level machine learning applications can directly use the programming interfaces of various programming frameworks (such as TensorFlow, Caffe, MXNet, etc.) or can be programmed directly using the interfaces provided by the machine learning library.
[0064] To maximize the utilization of hardware resources (including computing and storage resources) on a dedicated processor, it is typically necessary to configure the implementation of operators in machine learning libraries. Operator configuration determines the storage allocation and computational requirements of the operators. However, different configuration methods result in different computational performances for the configured operators. This application primarily addresses how to automatically adjust the configuration parameters of operators to ensure better computational performance during operation.
[0065] See Figure 1 , Figure 1 This is a flowchart illustrating an operator tuning method provided in an embodiment of this application. The method may include the following steps:
[0066] 101: Obtain at least one operator configuration method corresponding to the operator, wherein each operator configuration method is determined by a set of configuration parameters.
[0067] Each set of configuration parameters includes one or more configuration parameters of different types. For example, these parameters may represent the operator's partitioning strategy (i.e., the number of blocks in the operator), the operator's parallelism (i.e., the multi-stage pipelined processing of the operator), the operator's code writing rules, the operator's input data partitioning strategy, the allocation of storage space for the operator's output results, and so on. The operator's code writing rules could include, for example, writing the operator's code sequentially using nested for loops, or writing the operator's code using loop tiling. This application primarily uses the operator's partitioning strategy as an example for illustration. For instance, each configuration parameter can be a user-defined immediate value. The values of some or all of the configuration parameters in each set may differ, and each set of configuration parameters corresponds to a different operator configuration method.
[0068] For example, when the on-chip memory of the hardware cannot meet the storage requirements during the operation of an operator, the operator can be divided into blocks. For instance, the number of blocks for the operator can be configured based on the memory resources allocated to it, allowing for at least one operator configuration method to be generated based on different block numbers (i.e., different configuration parameters). For example, the number of blocks for an operator can be set to 5, 8, 10, etc. For a complete convolution operator, it can be split into multiple sub-operators based on the obtained operator configuration method. Similarly, the input data of the convolution operator is also split into multiple parts to obtain sub-input data corresponding to each sub-operator. The sub-input data of each sub-operator is then used as the input data for that sub-operator to implement the partial convolution operation process corresponding to that sub-operator.
[0069] Furthermore, the operator is configured according to each operator configuration method to obtain the corresponding operator under that operator configuration method. The configuration parameters in the operator configuration method can be used to characterize the operation logic during the operation of the operator (for example, the configuration parameter of the number of blocks can be used to characterize the splitting operation logic of the operator during the operation).
[0070] 102: By executing the operator obtained in each of the operator configuration methods, the first operational performance of the operator under each of the operator configuration methods is obtained.
[0071] For example, an artificial intelligence processor can be used to run the operator to obtain the computational performance of the operator in different operator configurations. Specifically, the first test data can be used as the input data of the operator. The executable program corresponding to the operator and the input data of the operator are transferred to the artificial intelligence processor, which can then run the operator to obtain its computational performance. In this application, each operator configuration can be considered as a different operator. By executing the operator obtained in each of the aforementioned operator configurations, the first computational performance of the operator under each of the aforementioned operator configurations is obtained. The data size of the first test data can be the maximum data size allowed by the operator. Of course, the data size of the first test data can also be other data sizes within the range allowed by the operator, which is not specifically limited here. For example, the range allowed by the operator in this embodiment of the disclosure can be [A, B], where A represents the minimum data size allowed by the operator, and B represents the maximum data size allowed by the operator. Then the data size of the first test data can be the maximum data size B, or it can be other values within the range [A, B].
[0072] The process of generating the first test data is described below.
[0073] First, it should be noted that when an AI processor invokes operators for computation, the allowed data size of the operators is within a certain range, and the maximum allowed data size varies depending on the type of input data for different operators. For example, for the convolution operator, whose input data is a two-dimensional matrix, the maximum allowed data size refers to the maximum size of the allowed two-dimensional matrix. For the activation operator, whose input data is a feature vector, the maximum allowed data size refers to the maximum dimension of the allowed feature vector.
[0074] Furthermore, for operator operations, the main factor affecting the operator's performance is the size of the data. The specific values within the data do not affect the operator's performance. Therefore, after determining the size of the first test data, the specific values of the data in the first test data can be determined through random selection or other methods. This application does not limit the method of setting the values.
[0075] For example, for a convolution operator, the input data is a two-dimensional matrix. Therefore, the maximum size of the two-dimensional matrix allowed by the convolution operator can be set to the data size of the first test data. Specifically, the input data required by the convolution operator includes two parts: input data (with a maximum allowed data size of N*L) and weights (with a maximum allowed data size of L*M). Therefore, the first test data also includes two parts: data and weights. The data size of the data in the first test data can be set to N*L, and the data size of the weights can be set to L*M. Then, the first test data can be generated by setting all values of the data in the first test data to 2 and all values of the weights in the first test data to 1 using a pre-defined value setting.
[0076] Furthermore, after generating the first test data, the first test data is used as the input data for the executable program of the operator obtained under each operator configuration mode, so as to obtain the first operation performance of the operator under each operator configuration mode.
[0077] Optionally, the first operational performance of the operator under each operator configuration can be measured by the first operational time and / or the first throughput of the operator.
[0078] 103: Determine the target operator configuration method among the at least one operator configuration methods based on the first operational performance of the operator under each operator configuration method.
[0079] Optionally, this disclosure may sort the first operational performance of the operators after running for each operator configuration mode to select a target operator configuration mode. For example, this disclosure may sort at least one first operational performance mode from high to low to select a target operator configuration mode.
[0080] As can be seen, in the embodiments of this application, at least one operator configuration method is first generated, then the operator is configured based on each operator configuration method, and the operator obtained by each operator configuration method is tested to obtain the first operation performance of the operator under each operator configuration method. Finally, based on at least one first operation performance corresponding to at least one operator configuration method, a target operator configuration method is selected from at least one target operator configuration method. In this way, an operator configuration method that is suitable for all data sizes of the operator can be found. When configuring the operator in the future, the target operator configuration method can be used to configure the operator so that the configured operator has a good operation performance under all sizes of input data.
[0081] In one embodiment of this disclosure, such as Figure 2As shown, the process of selecting the target operator configuration method described above may include:
[0082] Step 201: Based on the first operational performance of the operator under each operator configuration, select k candidate operator configurations from at least one operator configuration. Here, k is a positive integer greater than 1, and k is less than the total number of operator configurations.
[0083] For example, k candidate operator configuration methods can be selected from at least one operator in descending order of first operational performance.
[0084] Step 202: By executing the operators obtained under each candidate operator configuration method, obtain the second operational performance of the operators under each candidate operator configuration method.
[0085] Similarly, the second operational performance of the operator under each candidate operator configuration is the second operational time and / or the second throughput of the operator.
[0086] Step 203: Determine the target operator configuration based on the second operational performance of the operator under each of the candidate operator configurations.
[0087] Optionally, the second operational performance of the operator under each candidate operator configuration can be measured by the operator's second operational time and / or second throughput.
[0088] The target operator configuration is the second highest-performing candidate operator configuration among the k candidate operator configurations. For example, when computational performance is measured by computation time, the target operator configuration is the second lowest-performing operator configuration among the k candidate operator configurations.
[0089] Optionally, embodiments of this disclosure can determine the second operational performance of the operator corresponding to each candidate operator configuration by evaluating the operational performance of the operator obtained under different scales of input data. The different scales of input data can be determined by sampling within the range of the input data scale allowed by the operator.
[0090] Specifically, for each candidate operator configuration, multiple second test data sets are used as input data for that operator. The third operational performance of the operator under each candidate operator configuration is obtained when testing the operator using each second test data set. The data size of each second test data set is less than or equal to the maximum allowed data size of the operator. Optionally, each second test data set corresponds to one third operational performance. Specifically, for a single second test data set, it is used as input data for the operator obtained under that candidate operator configuration. The executable program corresponding to the operator and the input data of the operator are then transferred to an AI processor. The AI processor can run the operator to obtain the third operational performance of the operator when testing it using the second test data. Optionally, for each candidate operator configuration, the third operational performance obtained by testing the operator using each second test data set can be measured by the operator's third operational time and / or third throughput.
[0091] Furthermore, based on multiple third operational performance metrics, the second operational performance of the operator determined by this candidate operator configuration method is determined. For example, this disclosure can average the multiple third operational performance metrics of the operator under this candidate operator configuration method to obtain the second operational performance of the operator under this candidate operator configuration method.
[0092] For example, for a convolution operator, different data and / or operator splitting strategies can be used, and each operator splitting strategy can correspond to a different operator configuration. Assuming that the convolution operator in this embodiment corresponds to a total of h operator configurations, k candidate operator configurations can be selected from the h operator configurations by comparing the first operational performance.
[0093] For the j-th candidate operator configuration method (where j is a positive integer greater than 1 and j is less than or equal to k) among the k candidate operator configuration methods, this disclosure can perform the following operations respectively:
[0094] First, multiple second test data are used as input data for the convolution operator corresponding to the j-th candidate operator configuration method;
[0095] Secondly, the processor is used to run the third operational performance of the convolution operator corresponding to the j-th candidate operator configuration under different input data. There can be multiple third operational performance metrics, and the number of third operational performance metrics can correspond one-to-one with the number of second test data sets.
[0096] Finally, the second operational performance of the convolution operator corresponding to the j-th candidate operator configuration is determined based on multiple third operational performance metrics. For example, the average of multiple third operational performance metrics can be used as the second operational performance of the convolution operator corresponding to the j-th candidate operator configuration.
[0097] Furthermore, embodiments of this disclosure can determine k second operational performance characteristics corresponding to k operators under k candidate operator configuration methods, and determine the target operator configuration method based on these k second operational performance characteristics. Specifically, embodiments of this disclosure can sort the second operational performance characteristics corresponding to the k operators under the k candidate operator configuration methods, and determine the target operator configuration method as the candidate operator configuration method with the highest second operational performance among the k candidate operator configuration methods. These second operational performance characteristics can be evaluated using parameters such as operator runtime and / or throughput.
[0098] It should be noted that the plurality of second test data are pre-generated before executing the operator obtained by each candidate operator configuration method. For example, before executing the operator obtained by each candidate operator configuration method, sampling is performed within the data size range allowed by the operator using a target sampling method to obtain the plurality of sampled data sizes; finally, a plurality of second test data are generated based on the plurality of sampled data sizes, wherein the data size of the plurality of second test data corresponds one-to-one with the plurality of sampled data sizes. For example, the data size range allowed by the operator in this embodiment of the disclosure can be [A, B], where A represents the minimum data size allowed by the operator, and B represents the maximum data size allowed by the operator.
[0099] Optionally, the target sampling method mentioned above is selected from a variety of preset sampling methods; these preset sampling methods include, but are not limited to, average sampling and dynamic sampling. Average sampling refers to sampling at a constant sampling interval, such as sampling at a pre-specified sampling interval; dynamic sampling refers to sampling at a dynamically changing sampling interval, such as sampling at an exponentially changing sampling interval.
[0100] Optionally, for the multiple preset sampling methods, one preset sampling method can be randomly selected as the target sampling method; or, a user-defined target data size can be obtained, wherein the target data size falls within the range of data sizes allowed by the operator. Specifically, the target data size is the data size that needs to be tested in detail. For example, the maximum allowed input data size of the operator is N*M, and the user-defined target data size is N1*M1, where N1 is less than N and M1 is less than M. It can be understood that although the operator allows a certain range of data sizes, in practical applications, the size of the input data of the operator may frequently fall within a certain size range, and the target data size is this size range.
[0101] Optionally, a data size setting function can be set on the visual interface, allowing users to set a target data size based on their actual needs. Then, the AI processor selects the target sampling method from among the various preset sampling methods based on the target data size, ensuring that the multiple sampled data sizes obtained through this target sampling method fall within the target data size.
[0102] It should be noted that if the input data of the operator is multidimensional, when sampling the data of the operator's allowed size, it is necessary to sample each dimension separately, and then cross-combine the sampling results of each dimension to obtain multiple sampled data sizes. For example, if the input data is a two-dimensional matrix, the length and width of the two-dimensional matrix can be sampled separately, and the sampling results of the length and width can be cross-combined to obtain the multiple sampled data sizes. The sampling interval for each dimension can be the same or different; this application does not impose specific limitations.
[0103] Similarly, after determining the data size of each second test data, the values of the data in the second test data are determined, thereby generating multiple second test data.
[0104] As can be seen, in the embodiments of this application, at least one operator configuration method is first obtained. Then, by executing the operator obtained under each of the operator configuration methods, the first operational performance of the operator under each operator configuration method is obtained. First, k candidate operator configuration methods with the highest operational performance are selected from the at least one operator configuration method. Then, multiple second test data (wherein, the data sizes of the multiple second test data are different) are used to test the executable module of the operator under each candidate operator configuration method, and the target operator configuration method with the best operational performance is selected. It can be understood that this target operator configuration method is the operator configuration method with the best operational performance when tested using multiple second test data. After selecting the target operator configuration method, when the operator is called subsequently, this target operator configuration method can be used to configure the operator. Thus, regardless of the size of the input data of the operator being called, there will be a good operational performance, thereby achieving a better configuration method for all data sizes when the size of the input data of the operator is variable.
[0105] In one embodiment of this application, when the operator is subsequently invoked for computation, the target operator configuration method is used to configure the operator.
[0106] For example, after determining the target operator configuration method, that is, after determining the target operator configuration method with the highest computing performance, the target operator configuration method can be used to configure the operator. Then, when the artificial intelligence processor calls the configured operator to perform the operation, it can obtain the operation result of the operator.
[0107] See Figure 3 , Figure 3 This is a flowchart illustrating another operator tuning method provided in an embodiment of this application. In this embodiment, [the method is related to...]. Figure 1 The same content as the illustrated embodiments will not be described again here.
[0108] For example, such as Figure 3 As shown, operators are first configured based on the Task Module model to obtain operators corresponding to different configuration methods. The operator configuration includes, but is not limited to: operator operation logic (operator name, operator operation method, etc.) and operator configuration parameters. Optionally, the operator configuration parameters can be defined in the operator's operation logic. For example, Figure 3 The intermediate codes Stmt1, Stmt2, Stmt3, ... shown can be understood as code files corresponding to operators obtained by at least one operator configuration method of this application, wherein each operator configuration method includes a set of configuration parameters.
[0109] Secondly, the compiler can compile the code file of at least one operator corresponding to the at least one operator configuration mode to obtain the executable program of the operator under each operator configuration mode, which contains binary instructions that the hardware processor can execute.
[0110] Finally, the processor (such as an AI processor) can call and execute the executable program of the operator under each operator configuration mode to obtain the running result of the operator under each operator configuration mode. The running result of the operator can include the operation result of the operator and the performance parameters during the operation process. The performance parameters can be fed back to the compiler, which can adjust the operator configuration mode of the operator according to the performance parameters, so as to realize the compiler automatically adjusts and optimizes the performance of the operator.
[0111] Specifically, the optimization module includes a sampling module (Random Mutation) and a performance evaluation module (LearnedCost Code). After the processor runs the operator, it can obtain the operator's computational performance under different operator configurations. Specifically, the processor uses the first test data as the operator's input data, calls and executes the executable program obtained under each operator configuration, obtains the first computational performance of the operator under each operator configuration, and then passes the first computational performance of the operator under each operator configuration to the performance evaluation module. The performance evaluation module then selects k candidate operator configurations from at least one operator configuration. Here, k is a positive integer greater than 1, and k is less than the total number of operator configurations.
[0112] Then, the sampling module determines multiple second test data of different sizes, and passes these second test data and the k candidate operator configurations selected by the performance evaluation module to the processor. The processor tests the operator obtained for each candidate operator configuration based on the multiple second test data, obtaining the second operational performance of the operator under each candidate operator configuration. Specifically, for each candidate operator configuration, the processor uses the multiple second test data as input data for that operator, then calls and executes the executable program for that operator, obtaining the third operational performance of the operator under that candidate operator configuration when testing with each second test data, thus obtaining multiple third operational performances of the operator under each candidate operator configuration. Further, the tuning module can average the multiple third operational performances of the operator under each candidate operator configuration to obtain the second operational performance of the operator under each candidate operator configuration.
[0113] Finally, the performance evaluation module determines the target operator configuration from the k candidate operator configurations. This target operator configuration is then passed to the task model for configuring the operator based on it. The compiler can then compile the operator obtained from this target configuration to produce the corresponding executable program. This allows the processor to directly use the executable module to complete the operator's computation process when the operator is subsequently invoked.
[0114] See Figure 4 , Figure 4 This application provides a functional unit block diagram of an operator tuning device. The operator tuning device 400 may include an acquisition unit 401 and a processing unit 402; wherein,
[0115] The acquisition unit 401 is used to acquire at least one operator configuration method corresponding to the operator, wherein each operator configuration method is determined by a set of configuration parameters;
[0116] The processing unit 402 is configured to obtain the first operational performance of the operator under each of the operator configuration methods by executing the operator obtained under each of the operator configuration methods; and determine the target operator configuration method among the at least one operator configuration methods based on the first operational performance of the operator under each of the operator configuration methods.
[0117] In one possible implementation, the above-described operator tuning device can be applied to a compiler, such as a neural network compiler.
[0118] In some possible implementations, the processing unit 402 is specifically configured to: determine the target operator configuration in the at least one operator configuration based on the first operational performance of the operator in each of the operator configurations;
[0119] Based on the first operational performance of the operator under each operator configuration, k candidate operator configurations are selected from the at least one operator configuration, where k is a positive integer greater than 1 and k is less than the total number of operator configurations;
[0120] By executing the operator obtained in each of the candidate operator configuration methods, the second operational performance of the operator under each of the candidate operator configuration methods is obtained;
[0121] Based on the second operational performance of the operator under each of the candidate operator configuration methods, the target operator configuration method is determined, and the target operator configuration method is the candidate operator configuration method with the highest second operational performance among the k candidate operator configuration methods.
[0122] In some possible implementations, in order to obtain the first operational performance of the operator under each of the operator configurations by executing the operator obtained by each of the operator configurations, the processing unit 402 is specifically used for:
[0123] Using the first test data as input data for the operator, the operator obtained by each of the operator configuration methods is executed to obtain the first operational performance of the operator under each of the operator configuration methods. The data size of the first test data can be the maximum data size allowed by the operator. In other embodiments of this disclosure, the data size of the first test data can also be other data sizes within the range of data sizes allowed by the operator.
[0124] In some possible implementations, in terms of obtaining the second operational performance of the operator under each of the candidate operator configurations by executing the operator obtained by each of the candidate operator configurations, the processing unit 402 is specifically used for:
[0125] For each candidate operator configuration, multiple second test data are used as input data for the operator. By executing the operator, the second operational performance of the operator under the candidate operator configuration is obtained, wherein the data size of the multiple test data is within the range of the data size allowed by the operator.
[0126] In one embodiment of this application, in order to obtain the second operational performance of the operator under each candidate operator configuration by using multiple second test data as input data for the operator, and executing the operator obtained by each candidate operator configuration, the processing unit 402 is specifically used for:
[0127] Each of the second test data is used as the input data of the operator, and the operator obtained by each of the candidate operator configuration methods is executed. The third operation performance of the operator under each of the candidate operator configuration methods is obtained when the operator is tested using each of the second test data.
[0128] Based on the multiple third operational performances, the second operational performance of the operator under each of the candidate operator configuration methods is obtained.
[0129] In some possible implementations, before obtaining the second operational performance of the operator under each of the candidate operator configurations by executing the operator obtained by each of the candidate operator configurations, the processing unit 402 is further configured to:
[0130] The target sampling method is used to sample within the data size allowed by the operator to obtain the multiple sampled data sizes;
[0131] The plurality of second test data are generated based on the plurality of sampled data sizes.
[0132] In one embodiment of this application, before sampling within the maximum data size allowed by the operator using the target sampling method, the processing unit 402 is further configured to:
[0133] In some possible implementations, the processing unit 402 is further configured to:
[0134] Configure the operator using the target operator configuration method.
[0135] In some possible implementations, the first operational performance of the operator is the first operational time and / or the first throughput of the operator;
[0136] The second operational performance of the operator is the second operational time and / or the second throughput of the operator.
[0137] See Figure 5 , Figure 5 This is a schematic diagram of the structure of an electronic device provided in an embodiment of this application. Figure 4 As shown, the electronic device 500 includes a transceiver 501, a processor 502, and a memory 503. These are connected via a bus 503. The memory 503 stores computer programs and data, and can transfer data stored in the memory 503 to the processor 502.
[0138] Processor 502 is used to read the computer program in memory 503 and perform the following operations:
[0139] Obtain at least one operator configuration method corresponding to the operator, wherein each operator configuration method is determined by a set of configuration parameters;
[0140] By executing the operator obtained in each of the operator configuration methods, the first operational performance of the operator under each of the operator configuration methods is obtained;
[0141] Based on the first operational performance of the operator under each of the operator configuration methods, the target operator configuration method is determined among the at least one operator configuration methods.
[0142] The processor 502 can implement the functions of the processing unit 303 of the operator tuning device 300, so the specific functions of the processor 502 will not be described further.
[0143] This application also provides a computer-readable storage medium storing a computer program that is executed by a processor to implement some or all of the steps of any of the operator tuning methods described in the above method embodiments. For details, please refer to the descriptions of the above method embodiments, which will not be repeated here.
[0144] This application also provides a computer program product, which includes a non-transitory computer-readable storage medium storing a computer program operable to cause a computer to perform some or all of the steps of any of the operator tuning methods described in the above method embodiments. For details, please refer to the descriptions of the above method embodiments, which will not be repeated here.
[0145] It should be noted that, for the sake of simplicity, the foregoing method embodiments are all described as a series of actions. However, those skilled in the art should understand that this application is not limited to the described order of actions, as some steps may be performed in other orders or simultaneously according to this application. Furthermore, those skilled in the art should also understand that the embodiments described in the specification are all optional embodiments, and the actions and modules involved are not necessarily essential to this application.
[0146] In the above embodiments, the descriptions of each embodiment have different focuses. For parts not described in detail in a certain embodiment, please refer to the relevant descriptions in other embodiments.
[0147] In the several embodiments provided in this application, it should be understood that the disclosed apparatus can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection between devices or units may be electrical or other forms.
[0148] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.
[0149] Furthermore, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software program module.
[0150] If the integrated unit is implemented as a software program module and sold or used as an independent product, it can be stored in a computer-readable storage device (CMD). Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a memory and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this application. The aforementioned memory includes various media capable of storing program code, such as USB flash drives, read-only memory (ROM), random access memory (RAM), portable hard drives, magnetic disks, or optical disks.
[0151] Those skilled in the art will understand that all or part of the steps in the various methods of the above embodiments can be implemented by a program instructing related hardware. The program can be stored in a computer-readable storage medium, which may include: flash drive, read-only memory (ROM), random access memory (RAM), disk or optical disk, etc.
[0152] The embodiments of this application have been described in detail above. Specific examples have been used to illustrate the principles and implementation methods of this application. The description of the above embodiments is only for the purpose of helping to understand the method and core ideas of this application. At the same time, for those skilled in the art, there will be changes in the specific implementation methods and application scope based on the ideas of this application. Therefore, the content of this specification should not be construed as a limitation of this application.
Claims
1. An operator tuning method, the operator running on a dedicated processor; characterized in that, include: Obtain at least one operator configuration method corresponding to the operator, wherein each operator configuration method is determined by a set of configuration parameters; the set of configuration parameters includes one or more configuration parameters, wherein the one or more configuration parameters are the parallelism of the operator, the code writing rules of the operator, and the allocation of storage space for the output result of the operator; the data during the operation of the operator is stored in on-chip memory; By executing the operator obtained in each of the operator configuration methods, the first operational performance of the operator under each of the operator configuration methods is obtained; Based on the first operational performance of the operator under each of the operator configuration methods, the target operator configuration method among the at least one operator configuration method is determined; The step of determining the target operator configuration method among the at least one operator configuration methods based on the first operational performance of the operator under each of the operator configuration methods includes: Based on the first operational performance of the operator under each operator configuration, k candidate operator configurations are selected from the at least one operator configuration, where k is a positive integer greater than 1 and k is less than or equal to the total number of operator configurations; By executing the operator obtained in each of the candidate operator configuration methods, the second operational performance of the operator under each of the candidate operator configuration methods is obtained; Based on the second operational performance of the operator under each of the candidate operator configuration methods, the target operator configuration method is determined, and the target operator configuration method is the candidate operator configuration method with the highest second operational performance among the k candidate operator configuration methods; The step of obtaining the second operational performance of the operator under each candidate operator configuration by executing the operator obtained in each candidate operator configuration includes: For each candidate operator configuration method, multiple second test data are used as input data for the operator. By executing the operator, the second operational performance of the operator under the candidate operator configuration method is obtained. The data size of the multiple test data is within the data size range allowed by the operator. The first operational performance of the operator includes the first operational time and / or the first throughput of the operator; The second operational performance of the operator includes the second operational time of the operator and / or the second throughput of the operator.
2. The method according to claim 1, characterized in that, The step of obtaining the first operational performance of the operator under each of the operator configuration methods by executing the operator obtained by each of the operator configuration methods includes: Use the first test data as the input data for the operator; By executing the operator obtained in each of the operator configuration methods, the first operational performance of the operator under each of the operator configuration methods is obtained.
3. The method according to claim 1, characterized in that, The step of using multiple second test data as input data for the operator, and obtaining the second operational performance of the operator under this candidate operator configuration by executing the operator, includes: Each of the second test data is used as the input data of the operator, and the operator obtained by each of the candidate operator configuration methods is executed. The third operation performance of the operator under each of the candidate operator configuration methods is obtained when the operator is tested using each of the second test data. Based on the multiple third operational performances, the second operational performance of the operator under each of the candidate operator configuration methods is obtained.
4. The method according to claim 1 or 3, characterized in that, The method further includes: Using the target sampling method, sampling is performed within the data size allowed by the operator to obtain multiple sampled data sizes; The plurality of second test data are generated based on the plurality of sampled data sizes.
5. The method according to claim 4, characterized in that, Before sampling within the maximum data size allowed by the operator using the targeted sampling method, the method further includes: Select the target sampling method from a variety of preset sampling methods.
6. The method according to claim 1, characterized in that, The method further includes: Configure the operator using the target operator configuration method.
7. An electronic device, characterized in that, include: A processor and a memory, the processor being connected to the memory, the memory being used to store a computer program, and the processor being used to execute the computer program stored in the memory to cause the electronic device to perform the method as described in any one of claims 1-6.
8. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program that is executed by a processor to implement the method as described in any one of claims 1-6.