Parameter adjustment method and apparatus, and electronic device and computer program product
By dynamically adjusting the microarchitecture and on-chip system parameters during chip execution, and combining machine learning and a non-persistent virtual file system, the problem of insufficient hardware parameter optimization in existing technologies is solved, achieving dynamic improvement of chip performance and enhancement of business performance.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- HUAWEI TECH CO LTD
- Filing Date
- 2025-08-25
- Publication Date
- 2026-06-25
AI Technical Summary
Existing technologies cannot effectively optimize dynamic parameters when adjusting the hardware parameters in a chip, resulting in limited performance improvements and an inability to adapt to the dynamic tuning needs of customers' real-world business operations.
By reading target parameters during chip execution and dynamically adjusting microarchitecture and on-chip system parameters based on business performance indicators, machine learning algorithms are used to find and test local optima, and a non-persistent virtual file system is combined to accelerate parameter adjustment.
It enables dynamic improvement of chip performance, enhances business performance indicators, reduces computational load, and improves the efficiency and stability of parameter adjustment.
Smart Images

Figure CN2025116654_25062026_PF_FP_ABST
Abstract
Description
Methods, apparatuses, electronic devices and computer program products for adjusting parameters
[0001] This application claims priority to Chinese Patent Application No. 202411907472.5, filed on December 20, 2024, entitled "Method, Apparatus, Electronic Device and Computer Program Product for Adjusting Parameters", the entire contents of which are incorporated herein by reference. Technical Field
[0002] The embodiments of this disclosure generally relate to the field of integrated circuit technology, and more specifically to methods, apparatus, electronic devices, and computer program products for adjusting parameters. Background Technology
[0003] Chips typically contain multiple hardware registers. Each register stores various hardware parameters specific to the microarchitecture and system-on-chip (SoC). In some integrated circuits, the number of hardware parameters can be quite large. In practical applications, these hardware parameters, based on their physical meaning, influence target performance metrics with different weighting factors in different business scenarios. Summary of the Invention
[0004] Embodiments of this disclosure provide a method for adjusting parameters, an apparatus for adjusting parameters, an electronic device, and a computer program product.
[0005] According to a first aspect of this disclosure, a method for adjusting parameters is provided. The method includes reading at least one target parameter of the chip during the chip's execution of a service. The at least one target parameter includes one or more of the following: microarchitecture parameters or on-chip system parameters. The method further includes obtaining at least one target service performance indicator. The method also includes adjusting the at least one target parameter based on the at least one target service performance indicator during the chip's execution of the service. Thus, during the chip's execution of a service, the chip's microarchitecture parameters and / or on-chip system parameters can be dynamically adjusted according to the target service performance indicator, so that the adjusted microarchitecture parameters and / or on-chip system parameters contribute to improving the target service performance indicator. Therefore, the performance of the service can improve as the chip operates.
[0006] In some embodiments of this disclosure, during the process of adjusting the at least one target parameter based on the at least one target service performance indicator, one or more sets of local optimal solutions for the at least one target parameter are determined based on the at least one target service performance indicator. Then, the at least one target parameter is adjusted based on the one or more sets of local optimal solutions. When the number of the at least one target parameter is relatively large, finding local optimal solutions for the at least one target parameter rather than global optimal solutions can lead to faster convergence, reducing computational load and achieving a balance between computational load and service performance.
[0007] In some embodiments of this disclosure, during the process of adjusting the at least one target parameter based on one or more sets of local optima, the service is tested multiple times using each of the sets of local optima. For each set of local optima, a specified value for the at least one target service performance indicator obtained in the multiple tests is determined. Then, based on the specified value obtained for the set of local optima, a target local optimum is determined from the set of local optima. The target local optimum is then determined as the value of the at least one target parameter.
[0008] In some embodiments of this disclosure, the designated value of the at least one target service performance indicator is the average value of the at least one target service performance indicator across multiple tests. The local optimum corresponding to the highest average value among one or more sets of local optima is determined as the target local optimum. In other embodiments of this disclosure, the designated value of the at least one target service performance indicator is the variance of the values of the at least one target service performance indicator across multiple tests. The local optimum corresponding to the lowest variance among one or more sets of local optima is determined as the target local optimum.
[0009] In these embodiments, by conducting multiple tests on the service, the target parameter with the best test performance can be selected. Furthermore, because the selected target parameter undergoes multiple tests, it achieves more stable service performance.
[0010] In some embodiments of this disclosure, during the process of determining one or more local optimal solutions for one or more target parameters based on one or more target business performance indicators, a machine learning algorithm is used to perform N rounds of optimization training on the one or more target parameters, with the one or more target business performance indicators as the regression target. In each round of optimization training, a set of solutions for the one or more target parameters is determined. Then, one or more local optimal solutions are determined from the N sets of solutions determined during the N rounds of optimization training. Here, N is greater than or equal to one. In this way, without prior physical meaning of the parameters, the local optimal solutions for dynamic parameters can be obtained through machine learning algorithms.
[0011] In some embodiments of this disclosure, the method further includes determining whether the service is a new service being executed for the first time. If the service is a new service, then for each round of optimization training, the optimization training begins after the first time period in which the service is executed. During the M rounds of optimization training, microarchitecture level-one metrics with a correlation degree higher than a correlation degree threshold are identified.
[0012] In some further embodiments of this disclosure, the method further includes, if the service is not a new service, initiating the first round of optimization training after the service has been executed for a first time period. In subsequent optimization training, the determined microarchitecture level 1 metric is used instead of the at least one target service performance metric as the regression target.
[0013] In these embodiments, because the target business performance metrics can only be determined after each business operation ends, the time interval between obtaining the values of two consecutive target business performance metrics is relatively long. This results in a longer training round for each round of optimization. In contrast, microarchitecture level-one metrics can be determined before the end of a single business operation, thus the time interval between obtaining the values of two consecutive microarchitecture level-one metrics is relatively short. Training does not need to wait until the end of the business operation to stop. This results in a shorter training round for each round of optimization. By utilizing microarchitecture level-one metrics that are highly correlated with the target business performance metrics, the training time for each round of optimization is significantly reduced.
[0014] In some embodiments of this disclosure, the at least one target parameter is read from and written to registers via a non-persistent virtual file system (sysfs). This facilitates the reading and writing of the at least one target parameter, speeding up the adjustment process and thus accelerating the tuning process.
[0015] According to a second aspect of this disclosure, an apparatus for adjusting parameters is provided. The apparatus includes a reading module, an acquisition module, and an adjustment module. The reading module is configured to read at least one target parameter of the chip during the chip's execution of a service. The at least one target parameter includes one or more of the following: microarchitecture parameters or on-chip system parameters. The acquisition module is configured to acquire at least one target service performance indicator of the service. The adjustment module is configured to adjust the at least one target parameter according to the at least one target service performance indicator during the chip's execution of a service. Thus, during the chip's execution of a service, the chip's microarchitecture parameters and / or on-chip system parameters can be dynamically adjusted according to the target service performance indicator of the service, so that the adjusted microarchitecture parameters and / or on-chip system parameters can contribute to improving the target service performance indicator. Therefore, the performance of the service can improve as the chip operates.
[0016] In a third aspect of this disclosure, an electronic device is provided. The electronic device includes at least one processor and a memory. The memory is coupled to the at least one processor and has instructions stored thereon. When executed by the at least one processor, the instructions cause the electronic device to perform the method according to a first aspect of this disclosure.
[0017] In a fourth aspect of this disclosure, a computer-readable storage medium is provided on which a computer program is stored. The computer program is executed by a processor to implement the method described according to a first aspect of this disclosure.
[0018] In a fifth aspect of this disclosure, a computer program product is provided, comprising computer-executable instructions. When executed by a processor, the instructions implement some or all of the steps of the method described according to a first aspect of this disclosure.
[0019] It is understood that the device for adjusting parameters in the second aspect, the electronic device in the third aspect, the computer storage medium in the fourth aspect, or the computer program product in the fifth aspect provided above are all used to execute the method provided in the first aspect. Therefore, the explanations or descriptions regarding the first aspect also apply to the second, third, fourth, and fifth aspects. Furthermore, the beneficial effects achievable by the second, third, fourth, and fifth aspects can be referred to the beneficial effects in the corresponding methods, and will not be repeated here. Attached Figure Description
[0020] The above and other objects, features and advantages of this disclosure will become more apparent from the accompanying drawings, in which like reference numerals generally denote like parts.
[0021] Figure 1 illustrates a schematic diagram of an example environment in which the apparatus and / or methods of embodiments of the present disclosure may be implemented.
[0022] Figure 2 shows an exemplary flowchart of a method for adjusting parameters according to an embodiment of the present disclosure.
[0023] Figure 3 illustrates an exemplary schematic diagram of a method for adjusting parameters according to an embodiment of the present disclosure.
[0024] Figure 4 illustrates an exemplary schematic diagram of a single process for adjusting parameters according to an embodiment of the present disclosure.
[0025] Figure 5 illustrates an exemplary schematic diagram of several processes for adjusting parameters according to embodiments of the present disclosure.
[0026] Figure 6A shows an exemplary relationship between microarchitecture level 1 metrics and target business performance metrics.
[0027] Figure 6B shows an exemplary timing diagram of the target business performance metrics.
[0028] Figure 6C shows an exemplary timing diagram of the microarchitecture level 1 metrics.
[0029] Figure 7 shows a schematic diagram of an apparatus for adjusting parameters according to an embodiment of the present disclosure.
[0030] In the various accompanying figures, the same or corresponding reference numerals indicate the same or corresponding parts. The elements in the accompanying figures are schematic and not drawn to scale. Detailed Implementation
[0031] Embodiments of this disclosure will now be described in more detail with reference to the accompanying drawings. While some embodiments of this disclosure are shown in the drawings, it should be understood that this disclosure can be implemented in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided to provide a more thorough and complete understanding of this disclosure. It should be understood that the accompanying drawings and embodiments of this disclosure are for illustrative purposes only and are not intended to limit the scope of protection of this disclosure.
[0032] In the description of embodiments of this disclosure, the term "comprising" and similar terms should be understood as open-ended inclusion, i.e., "including but not limited to". The term "based on" should be understood as "at least partially based on". The term "one embodiment" or "the embodiment" should be understood as "at least one embodiment". The terms "first", "second", etc., may refer to different or the same objects. Other explicit and implicit definitions may also be included below.
[0033] Unless otherwise defined, all terms used herein (including technical and scientific terms) shall have the same meaning as commonly understood by one of ordinary skill in the art to which this subject matter pertains. It will be further understood that terms such as those defined in commonly used dictionaries shall be interpreted as having the meaning consistent with their meaning in the context of the specification and in the relevant art, and shall not be interpreted in an idealized or overly formal form unless otherwise explicitly defined herein. As used herein, the statement of “connecting” or “coupling” two or more parts together shall mean that these parts are directly joined together or joined through at least one intermediate component.
[0034] As mentioned above, chips typically contain multiple hardware registers. Each register stores various hardware parameters specific to the microarchitecture (UARCH) and system-on-chip (SOC). An SOC is also known as a system-on-a-chip. The microarchitecture refers to the hardware design architecture of the cores in a central processing unit (CPU). Hardware parameters for the microarchitecture refer to parameters specific to the cores. SOC hardware parameters refer to all hardware parameters other than the microarchitecture parameters. SOC hardware parameters define the relationships between cores, such as transmission bandwidth and bit width. These hardware parameters, based on their physical meaning, influence target service performance metrics with different weighting factors in different business scenarios, such as CPU-intensive, memory-intensive, and I / O-intensive scenarios. Target service performance metrics refer to the performance metrics that the business is concerned with or aims to achieve.
[0035] One method for adjusting parameters is to display the Basic Input / Output System (BIOS) on the screen, showcasing a subset of adjustable hardware parameters, which the user then adjusts. These adjustments are ultimately saved as specific policies within the BIOS configuration. These parameters are fixed during chip execution and can only be configured upon system restart; therefore, they can be called static parameters. Users can determine relatively fixed strategies for static parameters based on common and / or typical workloads. Subsequent iterations of the BIOS version can ensure that the specific static parameter strategy delivers performance gains on workloads with distinct characteristics.
[0036] This parameter adjustment method is not comprehensive enough for overall microarchitecture and SOC parameter tuning capabilities. Firstly, it only adjusts static parameters and doesn't consider dynamic parameters (configurable parameters during system operation). In other words, it only optimizes a subset of microarchitecture and SOC parameters, thus its overall parameter tuning performance has room for improvement. Secondly, the parameter adjustment strategy is fixed and cannot be further dynamically tuned to meet the actual business needs of customers. Therefore, this method cannot efficiently obtain better parameter values and use these better values to design next-generation chips.
[0037] Therefore, embodiments of this disclosure propose a method for adjusting parameters. The method includes reading at least one target parameter of the chip during the chip's execution of a service. The at least one target parameter includes one or more of the following: microarchitecture parameters or on-chip system parameters. The method also includes obtaining at least one target service performance indicator. Furthermore, the method includes adjusting the at least one target parameter based on the at least one target service performance indicator during the chip's execution of the service. Thus, during the chip's execution of a service, the chip's microarchitecture parameters and / or on-chip system parameters can be dynamically adjusted according to the target service performance indicator, so that the adjusted microarchitecture parameters and / or on-chip system parameters can contribute to improving the target service performance indicator. Therefore, the performance of the service can improve as the chip operates.
[0038] The embodiments of this disclosure will now be described in further detail with reference to the accompanying drawings. Figure 1 shows a schematic diagram of an example environment 100 in which the apparatus and / or methods of the embodiments of this disclosure may be implemented. The apparatus and / or methods of the embodiments of this disclosure may be executed by a server or other computing device. The server or other computing device may include a chip. In example environment 100, hardware platform 110 may include a processor. The processor may include multiple registers. Kernel driver 120 may perform read or write operations on microarchitectural parameters and / or SOC parameters in the registers. Kernel driver 120 may support multiple hardware platforms and support extended hardware platforms. In other words, the apparatus and / or methods of the embodiments of this disclosure are adaptable to various hardware platforms. The business workload 140 runs on operating system 130. The embodiments of this disclosure may support multiple operating systems 130, such as OpenEuler, CentOS, Ubuntu, etc. A tuner 150 is used to implement the method of adjusting parameters according to the embodiments of this disclosure. In implementing this method, tuner 150 can enable workload 140 (cause workload 140 to run), as well as read and write microarchitecture parameters and / or SOC parameters.
[0039] Figure 2 shows an exemplary flowchart of a method 200 for adjusting parameters according to an embodiment of the present disclosure. This method 200 can be performed by a parameter-adjusting device. Here, the parameter-adjusting device can be a server or other computing device. A chip may be provided in the server or other computing device. The method 200 will now be illustrated schematically using a computing device as the execution subject as an example.
[0040] At block 202, during the execution of business operations by the chip, the computing device reads at least one target parameter of the chip. This at least one target parameter includes one or more of the following: microarchitecture parameters or on-chip system parameters (which may be alternatively referred to as "SOC parameters" in this context). In some embodiments of this disclosure, the at least one target parameter is a dynamic parameter that can be configured during on-chip system runtime.
[0041] At box 204, the computing device acquires at least one target service performance indicator (SMI). In some embodiments of this disclosure, the at least one target service performance indicator may be input by a user. The user may indicate the optimization direction of the at least one target service performance indicator, for example, whether it is better to be larger or smaller.
[0042] At box 206, during the chip's execution of services, the computing device adjusts the at least one target parameter based on the at least one target service performance indicator. In some embodiments of this disclosure, when the computing device determines that the value of the at least one target service performance indicator has reached an optimization target, it determines the value of the target parameter that allows the optimization target to be obtained as the optimized value of the target parameter.
[0043] It should be noted that the operation at box 204 can be performed after the operation at box 202, before the operation at box 202, or in parallel with the operation at box 202.
[0044] Method 200 can dynamically adjust the chip's microarchitecture parameters and / or SOC parameters based on the target service performance indicators during the chip's service execution process, so that the adjusted microarchitecture parameters and / or SOC parameters can help improve the target service performance indicators. Therefore, the performance of the service can improve as the chip operates.
[0045] Figure 3 illustrates an exemplary schematic diagram of a method for adjusting parameters according to an embodiment of the present disclosure. As described above, the parameters to be adjusted in embodiments of the present disclosure include one or more of microarchitecture parameters or SOC parameters. The computing device can determine dynamic parameters 301 that can be configured during SOC operation. These dynamic parameters 301 can be mapped one-to-one with corresponding address bits in the logical address of register 302 of the chip via a bitmap. In one example, parameter 1 is 5 bits long, and the variable name of parameter 1 can be associated with bits 53 to 57 in the logical address of register 3. When parameter 1 is read, the current value of parameter 1 can be obtained from bits 53 to 57 in the logical address of register 3. When parameter 1 is written, the latest value of parameter 1 can be written to bits 53 to 57 in the logical address of register 3. Similarly, parameter 2 is 2 bits long, and the variable name of parameter 2 can be associated with bits 59 to 60 in the logical address of register 3. When parameter 2 is read, the current value of parameter 2 can be obtained from bits 59 to 60 in the logical address of register 3. When parameter 2 is written, its latest value can be written to bits 59 to 60 of the logical address of register 3. Parameter 3 is 4 bits long, and its variable name can be associated with bits 10 to 13 of the logical address of register 1. When parameter 3 is read, its current value can be obtained from bits 10 to 13 of the logical address of register 1. When parameter 3 is written, its latest value can be written to bits 10 to 13 of the logical address of register 1.
[0046] In Figure 3, parameters marked NA represent static parameters that cannot be configured during SOC runtime. In one example, bit 58 of the logical address of register 3 is associated with a static parameter marked NA. Thus, no mapping is established in the bitmap between this static parameter marked NA and bit 58 of the logical address of register 3. In one example, the computing device can also remove parameters from dynamic parameter 301 that are completely irrelevant to the target service performance indicators of the currently running service (marking irrelevant parameters as NA) to save computational resources, based on the physical meaning of dynamic parameter 301. Suppose bits 0 to 7 of the logical address of register 2 are associated with a dynamic parameter, but this dynamic parameter is completely irrelevant to the target service performance indicators of the currently running service; then this dynamic parameter is marked NA. Thus, no mapping is established in the bitmap between this dynamic parameter marked NA and bits 0 to 7 of the logical address of register 2. In one example, parameters associated with a single register (e.g., register 4) may all be marked NA. Thus, no mapping is established in the bitmap between this register (e.g., register 4) and any parameter.
[0047] Registers associated with microarchitecture parameters in register 302 can be combined and associated with microarchitecture parameter space 305. Registers associated with SOC parameters in register 302 can be combined and associated with SOC parameter space 306. During parameter tuning, microarchitecture parameters in microarchitecture parameter space 305 and SOC parameters in SOC parameter space 306 can be adjusted separately or together and written to the corresponding registers in hardware platform 110. After each service workload 140 finishes running on operating system 130, the value of the target service performance index can be obtained. The value of the target service performance index is used to represent the performance 307 that the service is concerned with. Target business performance metrics include, for example, the number of input / output operations per second (IOPS), requests per second (RPS), floating point operations per second (FLOPS), integer arithmetic performance, instruction-level parallelism (ILP), memory latency, cache hit ratio, cache coherence, power consumption, performance per watt, mean time to failure (MTTF), and error detection and correction capabilities. The embodiments disclosed herein are applicable to various target business performance metrics, and are not limited to those listed above.
[0048] In some embodiments of this disclosure, the at least one target parameter can be read from and written to registers via a non-persistent virtual file system (sysfs). This facilitates the reading and writing of the at least one target parameter, speeding up the adjustment process and thus accelerating the tuning process.
[0049] In some embodiments of this disclosure, machine learning algorithms can be used to fine-tune the at least one target parameter using the at least one target business performance indicator as a regression target. Figure 4 shows an exemplary schematic diagram of a single training process according to an embodiment of this disclosure. At block 401, the training process is initialized. During initialization, the computing device can receive user input regarding the service startup method. If the service itself does not have a stable load, the computing device can receive user input regarding the startup method of the business performance indicator benchmark tool. If the service itself has a stable load, it is not necessary to start the business performance indicator benchmark tool. The computing device can receive user input regarding the method for obtaining the target business performance indicator. The regression target can be a defined evaluation function derived from the at least one target business performance indicator. A simple example of this evaluation function is a single-parameter function f(x) = x. It should be noted that f(x) = x is only a simple example, and the evaluation function can also be a more complex function with multiple parameters. Other parameters of the training process can use default parameters. In one example, the user-inputted service startup method, the startup method of the business performance indicator benchmark tool, and the method for obtaining the target business performance indicator can be obtained in the form of a configuration file. The configuration file can have a fixed template, which makes it easy for users to provide input to the computing device that the computing device can parse.
[0050] At box 403, the computing device loads the driver. The driver can be loaded, for example, via the `insmod` command. `insmod` is a tool used in Linux systems to load kernel modules. After the driver is loaded, the at least one target parameter can be read from and written to registers via sysfs. All parameter reads and writes on a single register are handled uniformly and associated via a bitmap (see Figure 3). At box 405, the computing device reads a backup of the default parameters for use in the next training process.
[0051] At box 407, the computing device starts the service and tests the service performance baseline based on the default configuration. When the service has a stable load, the computing device uses the load provided by the service to perform service tests. When the service does not have a stable load, the computing device starts a service performance indicator stress testing tool. The service performance indicator stress testing tool can provide a stable load. This allows the computing device to use the load provided by the service performance indicator stress testing tool to perform service tests. In the case of load testing, the results of the service tests (target service performance indicators) have practical reference significance. The performance baseline can serve as the minimum reference value for the at least one target service performance indicator. When the number of the at least one target service performance indicator is greater than or equal to two, a weight can be set for each target service performance indicator, and the weighted sum of the at least one target service performance indicator can be determined as the performance baseline. In some embodiments of this disclosure, the weights of each target service performance indicator can be defined in an evaluation function. Assume the evaluation function is f(x,y,z)=a×x+b×y+c×z(1). In this evaluation function, x represents the first target service performance indicator. y represents the second target service performance indicator. z represents the third target service performance indicator. a represents the weight of the first target business performance indicator. b represents the weight of the second target business performance indicator. c represents the weight of the third target business performance indicator. Substituting the minimum reference value x_ref of the first target business performance indicator x, the minimum reference value y_ref of the second target business performance indicator y, and the minimum reference value z_ref of the third target business performance indicator z into equation (1), we can obtain the performance baseline f_ref=a×x_ref+b×y_ref+c×z_ref.
[0052] At box 409, the computing device can randomly set parameter values for the microarchitecture parameter space and the SOC parameter space within K rounds, and collect the values (or their weighted sum) of the target business performance metrics obtained in each round. K is greater than or equal to one. K can be, for example, half the total number of rounds, or other suitable values. The total number of rounds can be set according to the business scenario. K can also be set according to the business scenario. In one example, assume the microarchitecture parameter space includes two microarchitecture parameters d and e, and the SOC parameter space includes two SOC parameters f and g. In the first round, d can be set to the first value d1, e to the second value e1, f to the third value f1, and g to the fourth value g1. Then, the values of the various target business performance metrics under this configuration are collected. In the next round, one or more values of d, e, f, and g can be randomly adjusted. For example, d can be set to the fifth value d2, e to the second value e1, f to the third value f1, and g to the fourth value g1. Alternatively, d can be set to the fifth value d2, e to the second value e1, f to the sixth value f2, and g to the fourth value g1. Alternatively, d can be set to the fifth value d2, e to the seventh value e2, f to the sixth value f2, and g to the eighth value g2. In this example, the number of parameters adjusted in d, e, f, and g, as well as the adjusted parameter values, are random.
[0053] It should be noted that, for ease of description, the above example uses the microarchitecture parameter space including two microarchitecture parameters d and e, and the SOC parameter space including two SOC parameters f and g as examples. However, those skilled in the art should understand that the size of the microarchitecture parameter space and the SOC parameter space can be much larger than 2.
[0054] At box 411, the computing device can iteratively train the historical parameter values from the previous K rounds using a machine learning algorithm. In some embodiments of this disclosure, during this process, the computing device utilizes a machine learning algorithm, taking the at least one target business performance indicator as the regression target and the workload during business operation as the training data, to perform N rounds of optimization training on the at least one target parameter. This machine learning algorithm can be implemented using a traditional machine learning model. The at least one target parameter serves as the parameter to be trained on the machine learning model. The loss function of the machine learning model is determined based on the at least one target business performance indicator. In each round of optimization training, the computing device determines a set of solutions for the at least one target parameter with the goal of minimizing the loss function. Then, the computing device determines one or more sets of local optima from the N sets of solutions determined during the N rounds of optimization training. Here, N is greater than or equal to one. When the optimization direction of the at least one target business performance indicator is to maximize it, the one or more sets of local optima are solutions that maximize the at least one target business performance indicator. When the optimization direction of the at least one target business performance indicator is to minimize it, the one or more sets of local optima are solutions that minimize the at least one target business performance indicator. A local optimum is a solution that is optimal within a small region of the solution space, but it is not necessarily the global optimum. When the number of at least one objective parameter is large, finding a local optimum for that parameter rather than a global optimum can lead to faster convergence, reduce computational cost, and achieve a balance between computational complexity and business performance.
[0055] At box 413, the computing device can retest the performance of several sets of target parameters (local optima) that have historically achieved the best performance. Historical best performance refers to the optimal weighted sum of at least one target service performance indicator. When the tuning direction of at least one target service performance indicator is to maximize it, historical best performance refers to the maximum weighted sum of the at least one target service performance indicator. When the tuning direction of at least one target service performance indicator is to minimize it, historical best performance refers to the minimum weighted sum of the at least one target service performance indicator. During the performance retest, each set of local optima is used as parameters for a machine learning model. During service operation, the retested values of each target service performance indicator, or the retested value of the weighted sum of the at least one target service performance indicator, are obtained from the output of the machine learning model. Since the at least one target service performance indicator is related to the service workload, and the service workload varies, the service performance evaluation (the retested value of each target service performance indicator, or the retested value of the weighted sum of the at least one target service performance indicator) obtained for each set of target parameters during the performance retest may vary (fluctuate).
[0056] Then, at box 415, the computing device selects the set of target parameters (optimal values) that yields the best retest performance results and sets this set of target parameters as the adjusted target parameters. In some embodiments of this disclosure, for each set of local optimal solutions, a specified value for the at least one target service performance indicator obtained in multiple performance retests is determined. Then, based on the specified values obtained for one or more sets of local optimal solutions, a target local optimal solution is determined from one or more sets of local optimal solutions. The target local optimal solution is the optimal value among the one or more sets of local optimal solutions. The computing device may determine the target local optimal solution as the value of the at least one target parameter.
[0057] In some embodiments of this disclosure, each set of target parameters may be tested (retested) multiple times. The designated value of the at least one target service performance indicator is the average value of the at least one target service performance indicator across multiple tests. The local optimum with the highest average value among one or more sets of local optima is determined as the target local optimum. This maximizes the average value of the target service performance indicator, which is beneficial for achieving better service performance. Alternatively, the designated value of the at least one target service performance indicator is the variance of the value of the at least one target service performance indicator across multiple tests. The local optimum with the lowest variance among one or more sets of local optima is determined as the target local optimum. This minimizes the fluctuation of the target service performance indicator value, which is beneficial for achieving stable service performance. Alternatively, the computing device may sort the service performance evaluations obtained from each set of local optima across multiple tests based on their average values and select the S sets of local optima with the highest average value. S is greater than or equal to 2. Then, the computing device determines the local optimum with the lowest variance among the S sets of local optima as the target local optimum. Alternatively, the computing device can sort the variances of the service performance evaluation values obtained from each group of local optima across multiple tests and select the s groups of local optima with the lowest variances. s is greater than or equal to 2. Then, the computing device determines the local optimum with the highest average value among the s groups of local optima as the target local optimum.
[0058] At box 417, the computing device performs environment cleanup, such as restoring target parameters to their default values for use in the next business operation. At box 419, the computing device uninstalls the driver, ending the current training process.
[0059] In some business scenarios, performance metrics need to stabilize after a period of operation (warm-up). The warm-up time can even be comparable to the training time. This results in very long training cycles. When the business requires many rounds of training for the regression target to stabilize, the overall training time becomes extremely long. In one example, suppose we need to optimize over 200 target parameters. After 500 rounds of optimization training, the regression target stabilizes. Each test round lasts 20 minutes (including 10 minutes of warm-up and 10 minutes of load testing), so the total training time is 500 × 20 = 10,000 minutes.
[0060] Therefore, embodiments of this disclosure propose a regression acceleration scheme. This scheme analyzes the correlation between the target business performance index and the Level 1 metric of the microarchitecture, selecting alternative metrics that are strongly positively or negatively correlated with changes in the target business performance index to achieve approximate substitution regression of the target business performance index. Since the target business performance index can only be determined after each business cycle ends, the time interval between obtaining the values of two consecutive target business performance indices is relatively long. This results in a long training time per round of optimization. In contrast, the Level 1 metric of the microarchitecture can be determined before the end of a single business cycle, thus the time interval between obtaining the values of two consecutive Level 1 metric of the microarchitecture is relatively short. Training per round does not need to wait until the end of the business cycle to stop. This can significantly reduce the training time per round.
[0061] Figure 5 illustrates an exemplary schematic diagram of implementing the regression acceleration scheme described above. At box 503, the computing device determines whether the service is a new service being executed for the first time (i.e., a new scenario). If the service is new ("Yes" at box 503), the computing device performs optimization training at box 505 according to the process shown in Figure 4. This process includes several rounds of random testing, iterative optimization, and performance retesting. For each round of optimization training, the training begins only after the service has been executed for the first time period (warm-up). During the optimization training, the computing device performs service testing to obtain the test values of the target service performance metrics for that round. Unlike the process shown in Figure 4, at box 505, the computing device also performs microarchitecture monitoring during each round of optimization training to obtain the values of multiple microarchitecture level-one metrics.
[0062] At box 507, during M rounds of tuning and training, the computing device identifies micro-architecture level-one metrics whose correlation with the at least one target business performance metric is higher than a correlation threshold. In short, it identifies micro-architecture level-one metrics that are strongly positively or strongly negatively correlated with the at least one target business performance metric. M is greater than or equal to one. In one example, the computing device may sample the values of multiple micro-architecture level-one metrics multiple times to obtain multiple sampled values for each micro-architecture level-one metric. Then, the computing device may calculate the average of the multiple sampled values for each micro-architecture level-one metric and use the calculated average as the value of the corresponding micro-architecture level-one metric. For example, the computing device may sample the values of multiple micro-architecture level-one metrics every 10 seconds and average the 6 sampled values for each micro-architecture level-one metric. This yields a value for each micro-architecture level-one metric every minute. Next, the computing device may determine the correlation between the value of each micro-architecture level-one metric and the at least one target business performance metric over a period of time. It should be noted that although box 507 is shown after box 505, the operations at box 507 may also be performed in parallel with the operations at box 505.
[0063] Figure 6A illustrates an exemplary relationship between microarchitecture level 1 metric 601 and target business performance metric 602. In Figure 6A, the horizontal axis represents time values, and the vertical axis represents magnitude values. Microarchitecture level 1 metric 601 and target business performance metric 602 obtained at the same time are paired and marked at a single time point. Along the horizontal axis, the time points are arranged in ascending order according to the magnitude of the microarchitecture level 1 metric 601. Thus, a strong positive correlation between microarchitecture level 1 metric 601 and target business performance metric 602 can be easily observed in Figure 6A. The larger the microarchitecture level 1 metric 601, the larger the target business performance metric 602. In one example, the target business performance metric 602 is, for example, IOPS, while the microarchitecture level 1 metric 601 is, for example, instructions per cycle (IPC).
[0064] After the computing device discovers a microarchitecture level 1 metric that is strongly positively or negatively correlated with the at least one target business performance metric, at box 509, the computing device selects one or more microarchitecture level 1 metric metric for use in the same business scenario (old scenario) in the future.
[0065] If the business is not a new business (marked "No" at box 503), the computing device determines the current business scenario as an old scenario. At box 513, for the first round of optimization training, this round of optimization training begins after the business is executed for the first time period (warm-up). In subsequent optimization training, the computing device uses the determined microarchitecture level 1 metric to replace the at least one target business performance metric as the regression target. This way, except for the longer duration of the first round of optimization training, subsequent rounds of optimization training no longer require warm-up, thus significantly reducing their duration. It should be noted that during the first round of optimization training, either the microarchitecture level 1 metric can be used to replace the at least one target business performance metric as the regression target, or the at least one target business performance metric can be used as the regression target. To facilitate model training, the microarchitecture level 1 metric can be used as the regression target during the first round of optimization training.
[0066] Figure 6B shows an exemplary time series diagram of the target service performance metric. As can be seen from Figure 6B, after time T1, except for spikes occurring at times T2, T3, T4, and T5, the target service performance metric (e.g., IOPS) tends to stabilize. The spikes occurring at times T2, T3, T4, and T5 can be filtered out by filtering outliers. Figure 6C shows an exemplary time series diagram of the microarchitecture level 1 metric. As can be seen from Figure 6C, after time t1, except for spikes occurring at times t2, t3, t4, and t5, the microarchitecture level 1 metric (e.g., IPC) tends to stabilize. The spikes occurring at times t2, t3, t4, and t5 can be filtered out by filtering outliers. Based on the test results in Figures 6B and 6C, it can be seen that when the microarchitecture level 1 metric 601 is strongly correlated with the target service performance metric 602, the microarchitecture level 1 metric 601 can be used as a regression target instead of the target service performance metric 602. Moreover, the tuning time based on the microarchitecture level 1 metric 601 is significantly shorter than the tuning time based on the target service performance metric 602.
[0067] As can be seen from the above embodiments, since the target business performance indicators can only be determined after each business operation ends, the time interval between obtaining the values of two consecutive target business performance indicators is relatively long. This results in a long training cycle for each round of optimization. In contrast, microarchitecture level-one indicators can be determined before the end of a single business operation, thus the time interval between obtaining the values of two consecutive microarchitecture level-one indicators is relatively short. Training does not need to wait until the end of the business operation to stop. This results in a shorter training cycle for each round of optimization. By utilizing microarchitecture level-one indicators that are highly correlated with the target business performance indicators, the training time for each round of optimization is significantly reduced.
[0068] Figure 7 illustrates a schematic diagram of an apparatus 700 for adjusting parameters according to an embodiment of the present disclosure. The apparatus 700 is, for example, arranged in the tuner 150 of Figure 1. The apparatus 700 may include multiple modules for performing corresponding steps in the method 200 discussed in Figure 2. As shown in Figure 7, the apparatus 700 includes a reading module 702, an acquisition module 704, and an adjustment module 706. The reading module 702 is configured to read at least one target parameter of the chip during the chip's execution of a service. The at least one target parameter includes one or more of the following: microarchitecture parameters or on-chip system parameters. The acquisition module 704 is configured to acquire at least one target service performance indicator of the service. The adjustment module 706 is configured to adjust the at least one target parameter according to the at least one target service performance indicator during the chip's execution of a service. Thus, during the chip's execution of a service, the chip's microarchitecture parameters and / or on-chip system parameters can be dynamically adjusted according to the target service performance indicator of the service, so that the adjusted microarchitecture parameters and / or on-chip system parameters can contribute to improving the target service performance indicator. Therefore, the performance of this service can improve as the chip operates.
[0069] In some embodiments of this disclosure, the adjustment module 706 includes a first determining module and a sub-adjustment module. The first determining module is configured to determine one or more sets of local optimal solutions for the at least one target parameter based on the at least one target service performance indicator. The sub-adjustment module is configured to adjust the at least one target parameter based on one or more sets of local optimal solutions. When the number of the at least one target parameter is relatively large, finding local optimal solutions for the at least one target parameter rather than global optimal solutions can lead to faster convergence, reducing computational load and achieving a balance between computational load and service performance.
[0070] In some embodiments of this disclosure, the sub-adjustment module includes a testing module, a second determining module, a third determining module, and a fourth determining module. The testing module is configured to perform multiple tests on the service using each of one or more sets of local optima. The second determining module is configured to determine a specified value for the at least one target service performance indicator obtained in the multiple tests for each set of local optima. The third determining module is configured to determine a target local optimum from one or more sets of local optima based on the specified value obtained for the set of local optima. The fourth determining module is configured to determine the target local optimum as the value of the at least one target parameter. In these embodiments, by performing multiple tests on the service, the target parameter with the best test performance can be selected. Furthermore, because the selected target parameter has undergone multiple tests, it can achieve more stable service performance.
[0071] In some embodiments of this disclosure, the first determining module includes a first training module and a fifth determining module. The first training module is configured to use a machine learning algorithm to perform N rounds of optimization training on the at least one target parameter, with the at least one target business performance indicator as the regression target. During each round of optimization training, a set of solutions for the at least one target parameter is determined. The fifth determining module is configured to determine one or more local optimal solutions from the N sets of solutions determined during the N rounds of optimization training. Here, N is greater than or equal to one. Thus, without prior physical meaning of the parameters, the machine learning algorithm can achieve the acquisition of local optimal solutions for dynamic parameters.
[0072] In some embodiments of this disclosure, the apparatus 700 further includes a sixth determining module, a second training module, and a seventh determining module. The sixth determining module is configured to determine whether the service is a new service being executed for the first time. The second training module is configured to, if the service is a new service, begin each round of optimization training after the service has been executed for a first time period. The seventh determining module is further configured to, during M rounds of optimization training, determine microarchitecture level-one metrics whose correlation with the performance metric of the at least one target service is higher than a correlation threshold.
[0073] In some further embodiments of this disclosure, the apparatus 700 also includes a third training module. The third training module is configured to, for the first round of optimization training, begin the first round of optimization training after the service has been executed for a first time period, if the service is not a new service. The third training module is also configured to, in subsequent optimization training, replace the at least one target service performance metric with the determined microarchitecture level-one metric as the regression target.
[0074] In these embodiments, because the target service performance metrics can only be determined after each service session ends, the time interval between obtaining the values of two consecutive target service performance metrics is relatively long. This results in a longer training round for single-round optimization. In contrast, microarchitecture level 1 metrics can be determined before the end of a single service session, thus the time interval between obtaining the values of two consecutive microarchitecture level 1 metrics is relatively short. Single-round training does not need to wait until the service session ends to stop. This shortens the training round for single-round optimization. By utilizing microarchitecture level 1 metrics that are highly correlated with the target service performance metrics, the training time for single-round optimization is significantly reduced. Furthermore, since microarchitecture level 1 metrics are used as regression targets, it is possible to clearly identify which microarchitecture level 1 metrics have a greater impact on service performance under various types of workloads. This allows engineers to focus more on setting these microarchitecture level 1 metrics during subsequent chip design, which is beneficial for chip iterative upgrades.
[0075] In summary, the parameter adjustment method and apparatus according to embodiments of this disclosure can dynamically adjust the chip's microarchitecture parameters and / or on-chip system parameters based on the target performance indicators of the service during chip execution, so that the adjusted microarchitecture parameters and / or on-chip system parameters can help improve the target service performance indicators. Therefore, the performance of the service can improve as the chip operates. The parameter adjustment method and apparatus according to embodiments of this disclosure have no limitations on the service scenarios to be optimized, and can effectively reduce the optimization threshold and manpower investment.
[0076] This disclosure can be a method, apparatus, system, and / or computer program product. A computer program product may include a computer-readable storage medium having computer-readable program instructions loaded thereon for performing various aspects of this disclosure.
[0077] A computer-readable storage medium can be a tangible device capable of holding and storing instructions for use by an instruction execution device. A computer-readable storage medium can be, for example—but not limited to—an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination thereof. More specific examples (a non-exhaustive list) of computer-readable storage media include: random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static random access memory (SRAM), and any suitable combination thereof. The computer-readable storage medium as used herein is not to be construed as a transient signal itself, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., light pulses through fiber optic cables), or electrical signals transmitted through wires.
[0078] The computer-readable program instructions described herein can be downloaded from computer-readable storage media to various computing / processing devices, or downloaded via a network, such as the Internet, local area network, wide area network, and / or wireless network, to an external computer or external storage device. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and / or edge servers. A network adapter card or network interface in each computing / processing device receives the computer-readable program instructions from the network and forwards them to the computer-readable storage media in the respective computing / processing device.
[0079] Computer program instructions used to perform the operations of this disclosure may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-dependent instructions, microcode, firmware instructions, status setting data, or source code or object code written in any combination of one or more programming languages, including object-oriented programming languages such as Smalltalk, C++, etc., and conventional procedural programming languages such as the "C" language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partially on the user's computer, as a standalone software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In cases involving a remote computer, the remote computer may be connected to the user's computer via any type of network—including a local area network (LAN) or a wide area network (WAN)—or may be connected to an external computer (e.g., via the Internet using an Internet service provider). In some embodiments, electronic circuitry, such as programmable logic circuitry, field-programmable gate arrays (FPGAs), or programmable logic arrays (PLAs), is personalized by utilizing the status information of the computer-readable program instructions to implement various aspects of this disclosure.
[0080] Various aspects of this disclosure are described herein with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this disclosure. It should be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer-readable program instructions.
[0081] These computer-readable program instructions can be provided to a processing unit of a general-purpose computer, a special-purpose computer, or other programmable data processing apparatus to produce a machine such that, when executed by the processing unit of the computer or other programmable data processing apparatus, they create means for implementing the functions / actions specified in one or more blocks of the flowchart and / or block diagram. These computer-readable program instructions can also be stored in a computer-readable storage medium that causes a computer, programmable data processing apparatus, and / or other device to operate in a particular manner. Thus, the computer-readable medium storing the instructions comprises an article of manufacture that includes instructions for implementing aspects of the functions / actions specified in one or more blocks of the flowchart and / or block diagram.
[0082] Computer-readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable data processing apparatus, or other device to produce a computer-implemented process, thereby causing the instructions executed on the computer, other programmable data processing apparatus, or other device to perform the functions / actions specified in one or more boxes of a flowchart and / or block diagram.
[0083] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present disclosure. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of an instruction containing one or more executable instructions for implementing a specified logical function. In some alternative implementations, the functions marked in the blocks may occur in a different order than those shown in the drawings. For example, two consecutive blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, may be implemented using a dedicated hardware-based system that performs the specified function or action, or using a combination of dedicated hardware and computer instructions.
[0084] Unless otherwise expressly indicated by the context, the singular form of words used herein and in the appended claims includes the plural form, and vice versa. Thus, when referring to the singular, the plural form of the corresponding term is generally included. Where the term “example” is used herein, particularly when it follows a set of terms, the “example” is merely exemplary and illustrative and should not be considered exclusive or pervasive.
[0085] Further aspects and scope of adaptation become apparent from the description provided herein. It should be understood that various aspects of this application may be implemented individually or in combination with at least one other aspect. It should also be understood that the descriptions and specific embodiments herein are for illustrative purposes only and are not intended to limit the scope of this application.
[0086] The various embodiments of this disclosure have been described above. These descriptions are exemplary and not exhaustive, nor are they limited to the disclosed embodiments. Many modifications and variations will be apparent to those skilled in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen to best explain the principles, practical application, or improvement of the technology in the market, or to enable others skilled in the art to understand the embodiments disclosed herein.
Claims
1. A method for adjusting parameters, comprising: During the execution of business by the chip, at least one target parameter of the chip is read, wherein the at least one target parameter includes one or more of the following: microarchitecture parameters or on-chip system parameters; Obtain at least one target service performance indicator for the service; as well as During the execution of business by the chip, the at least one target parameter is adjusted according to the at least one target business performance indicator.
2. The method according to claim 1, wherein adjusting the at least one target parameter based on the at least one target service performance indicator comprises: Based on the at least one target service performance index, determine one or more sets of local optimal solutions for the at least one target parameter; as well as The at least one target parameter is adjusted based on one or more sets of local optimal solutions.
3. The method of claim 2, wherein adjusting the at least one target parameter based on the set or more sets of local optimal solutions comprises: The service is tested multiple times using each of the one or more sets of local optimal solutions. For each set of local optimal solutions, a specified value for the at least one target service performance indicator obtained in the multiple tests is determined; Based on specified values obtained from the set or more sets of local optimal solutions, determine the target local optimal solution from the set or more sets of local optimal solutions; and The local optimal solution of the objective is determined as the value of at least one objective parameter.
4. The method according to claim 3, wherein the designated value of the at least one target service performance indicator is the average value of the at least one target service performance indicator in the multiple tests, and the local optimum solution corresponding to the highest average value among the one or more sets of local optima is determined as the target local optimum solution; or The specified value of the at least one target business performance indicator is the variance of the values of the at least one target business performance indicator in the multiple tests, and the local optimum with the lowest variance among the one or more local optima is determined as the target local optimum.
5. The method according to any one of claims 2 to 4, wherein determining one or more sets of local optimal solutions for the at least one target parameter based on the at least one target service performance index includes: Using machine learning algorithms, with the at least one target business performance indicator as the regression target, N rounds of optimization training are performed on the at least one target parameter, wherein a set of solutions for the at least one target parameter is determined during each round of optimization training. as well as Determine one or more local optimal solutions from the N sets of solutions determined during the N rounds of optimization training, where N is greater than or equal to one.
6. The method according to claim 5, further comprising: Determine whether the service is a new service being executed for the first time; In response to the fact that the service is the new service, for each round of optimization training, the optimization training begins after the first time period in which the service is executed; as well as During the M rounds of optimization training, micro-architecture primary indicators that have a correlation degree higher than the correlation degree threshold with the at least one target business performance indicator are identified.
7. The method according to claim 6, further comprising: In response to the fact that the service is not the new service, for the first round of optimization training, the first round of optimization training begins after the service is executed in the first time period. as well as In subsequent optimization and training, the determined microarchitecture level 1 metric is used to replace the at least one target business performance metric as the regression target.
8. A device for adjusting parameters, comprising: A reading module is configured to read at least one target parameter of the chip during the execution of business operations, wherein the at least one target parameter includes one or more of the following: microarchitecture parameters or on-chip system parameters; An acquisition module is configured to acquire at least one target service performance metric of the service; as well as An adjustment module is configured to adjust the at least one target parameter based on the at least one target service performance indicator during the execution of services by the chip.
9. An electronic device, comprising: At least one processor; as well as A memory coupled to the at least one processor and having instructions stored thereon, which, when executed by the at least one processor, cause the electronic device to perform the method according to any one of claims 1-7.
10. A computer program product tangibly stored on a non-transient computer-readable medium and comprising machine-executable instructions for performing the method according to any one of claims 1-7.