Parameter adjustment device, parameter adjustment method, and program
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- PANASONIC INTELLECTUAL PROPERTY MANAGEMENT CO LTD
- Filing Date
- 2024-12-05
- Publication Date
- 2026-06-19
Smart Images

Figure CN122249772A_ABST
Abstract
Description
Technical Field
[0001] This disclosure relates to a parameter adjustment device, parameter adjustment method, and procedure. Background Technology
[0002] In DC-DC converters used in electric vehicles, complex and precise current and voltage control is achieved by combining multiple feedback controls to regulate charging and discharging in accordance with driving conditions. Similarly, in production equipment used in factories (e.g., component assembly machines, assembly robots), complex and precise motion control is achieved by combining multiple servo-controlled motors. In these devices, the actions of the controlled object are controlled according to a large number of control parameters. Users empirically adjust (i.e., optimize) these control parameters to obtain the desired performance regarding the actions of the controlled object.
[0003] In the process of adjusting control parameters, the more complex and diverse the action conditions that define the action mode of the controlled object, and the more control parameters are set to achieve these action conditions, the more difficult it becomes to optimize the control parameters. Consequently, the number of trials required for the control parameter adjustment process increases, placing a significant burden on the user's labor and time. Therefore, the demand for automating the control parameter adjustment process is increasing.
[0004] Therefore, a parameter adjustment device is disclosed for optimizing control parameters by limiting the operation conditions to bottleneck periods and bottleneck areas in the optimization of control parameters set for the operation control of traffic lights (for example, see Patent Document 1).
[0005] In addition, regarding the optimization of control parameters, there is a technique called Adaptive Scenario Subset Selection (AS3), which prioritizes the execution and evaluation of well-validated action conditions that have obtained the worst evaluation value in the optimization loop in order to effectively solve the minimax optimization problem (e.g., see Non-Patent Literature 1).
[0006] (Existing technical documents)
[0007] (Patent Documents)
[0008] Patent Document 1: Japanese Patent Application Publication No. 2017-117140
[0009] (Non-patent literature)
[0010] Non-patent literature 1: Atsuhiro Miyagi, Kazuto Fukuchi, Jun Sakuma, YouheiAkimoto, Adaptive scenario subset selection for worst-case optimization and its application to well placement optimization, Applied Soft Computing, Volume 133, 2023, 109842. Summary of the Invention
[0011] The problem that the invention aims to solve
[0012] However, in challenging operating conditions, such as those where most control parameters would cause control oscillations or output saturation, the parameter adjustment device described in Patent Document 1, due to its optimization focused on bottleneck operating conditions, may result in a very long time required for control parameter adjustment.
[0013] Furthermore, in cases involving operational conditions that require relatively long evaluation times, the parameter adjustment device described in Non-Patent Document 1 may actively execute and evaluate such operational conditions, resulting in a potentially very long time required for control parameter adjustment.
[0014] Therefore, the purpose of this disclosure is to provide a parameter adjustment device, parameter adjustment method, and procedure that can reduce the time for performance evaluation through simulation or actual equipment in the automatic adjustment of control parameters for multiple operating conditions.
[0015] Methods for solving problems
[0016] To achieve the above objective, one aspect of this disclosure involves a parameter adjustment device that explores optimal control parameters for operating the control system under multiple operating conditions by adjusting control parameters while simultaneously operating the control system. The parameter adjustment device comprises: a condition setting unit that sets at least one of the multiple operating conditions and the control parameters in the control system; an evaluation unit that calculates an evaluation value for each of the at least one operating condition, the evaluation value being an evaluation value related to the operation of the control system when it operates under the at least one operating condition and the control parameters set by the condition setting unit; a comprehensive evaluation unit that calculates a comprehensive evaluation value using a weighted average of at least the maximum and minimum values among the evaluation values calculated by the evaluation unit; and an optimization unit that calculates control parameters for the next operation of the control system based on the comprehensive evaluation value calculated by the comprehensive evaluation unit using an optimization algorithm. The condition setting unit sets the at least one operating condition and the control parameters calculated by the optimization unit in the control system.
[0017] To achieve the above objectives, one aspect of this disclosure involves a parameter adjustment method executed by a computer. This method explores optimal control parameters for the control system to operate under multiple operating conditions by adjusting control parameters while simultaneously actuating the control system. The parameter adjustment method includes: a condition setting step, where at least one of the multiple operating conditions and the control parameters are set in the control system; an evaluation step, where an evaluation value is calculated, the evaluation value being related to the operation of the control system under the at least one operating condition and the control parameters set in the condition setting step; a comprehensive evaluation step, where a weighted average value obtained using at least the maximum and minimum values among the evaluation values calculated in the evaluation step is used as a comprehensive evaluation value; and an optimization step, where, based on the comprehensive evaluation value calculated in the comprehensive evaluation step, an optimization algorithm is used to calculate control parameters for the next operation of the control system. In the condition setting step, the at least one operating condition and the control parameters calculated in the optimization step are set in the control system.
[0018] To achieve the above objectives, one aspect of this disclosure relates to a program used to cause the computer to perform the parameter adjustment method described above.
[0019] The effects of the invention
[0020] According to this disclosure, a parameter adjustment device, parameter adjustment method, and procedure are provided that can reduce the time for performance evaluation through simulation or actual equipment in the automatic adjustment of control parameters for multiple operating conditions. Attached Figure Description
[0021] Figure 1 This is a block diagram illustrating the configuration of a system including the parameter adjustment device involved in this disclosure.
[0022] Figure 2 yes Figure 1 The diagram shows a detailed configuration of the control system.
[0023] Figure 3 This is a flowchart illustrating the operation of the parameter adjustment device involved in this embodiment.
[0024] Figure 4 It means Figure 3 The flowchart shows the detailed actions of step S205.
[0025] Figure 5A This is a diagram showing an example of an objective function corresponding to multiple action conditions.
[0026] Figure 5B It is shown in the corresponding Figure 5A The objective function includes a graph showing how the control input and control quantity change over time at each point marked by a fork from a to an i.
[0027] Figure 6 The diagram illustrates an example of the shift between the selection probability of an action condition and the overall evaluation value under the following conditions: the selection probability update unit increases the selection probability of the action condition corresponding to the maximum value among the evaluation values, and the overall evaluation unit uses the maximum value among the evaluation values obtained through adjustment as the overall evaluation value.
[0028] Figure 7 The diagram illustrates an example of the shift between the selection probability of an action condition and the overall evaluation value under the following conditions: the selection probability update unit increases the selection probability of the action condition corresponding to the maximum and minimum values in the evaluation value, and the overall evaluation unit uses the average of the maximum and minimum values in the evaluation value obtained through adjustment as the overall evaluation value.
[0029] Figure 8 This is a diagram showing an example of the evaluation cost set according to each action condition.
[0030] Figure 9 This is a graph illustrating an example of the shift between the selection probability and the overall evaluation value of action conditions under the following circumstances: the selection probability update unit further executes [the process] to reduce the total evaluation cost during exploration. Figure 4 The processing of step S255.
[0031] Figure 10 This is a graph showing the relationship between the cumulative evaluation cost and the maximum value among the obtained evaluation values in each parameter adjustment method.
[0032] Figure 11 This is a diagram showing an example of an input / output unit displaying an image on an input / output device.
[0033] Figure 12 This is a table showing an example of whether multiple action conditions are qualified or not under a certain number of exploratory trials.
[0034] Figure 13 This is a graph showing the relationship between exploration progress and correction coefficients.
[0035] Figure 14 This indicates that during execution Figure 4 The graph shows an example of the shift in the selection probability of action conditions and the overall evaluation value under the processing of step S255.
[0036] Figure 15 This is a graph showing the shift in evaluation values when considering action conditions cond2 and cond3.
[0037] Figure 16 It is shown Figure 11 The image shown is a variation of the original image. Detailed Implementation
[0038] The embodiments of this disclosure will now be described with reference to the accompanying drawings. Furthermore, each of the embodiments described below illustrates a specific example of this disclosure. The numerical values, constituent elements, arrangement and connection methods of constituent elements, steps and their order, and display examples shown in the following embodiments are all examples and are not intended to limit the scope of this disclosure. Moreover, this disclosure is not limited to these examples, but rather includes all modifications shown in the technical solution and intended to be included within the technical solution and its equivalents. Furthermore, the technical features described in each embodiment can be combined with each other. Additionally, the figures are not strictly illustrative. In the figures, substantially identical components are given the same symbols, and repeated descriptions are omitted or simplified.
[0039] (Implementation Method)
[0040] [Overall Composition]
[0041] Figure 1 This is a block diagram illustrating the configuration of a system including the parameter adjustment device 1 disclosed herein. The system comprises the parameter adjustment device 1, a control system 2 which is the object of adjustment of the parameter adjustment device 1, and an input / output device 3, which is a device for the user to make various settings on the parameter adjustment device 1 and to confirm information related to the adjustment of control parameters.
[0042] The parameter adjustment device 1 is a device that explores the optimal control parameters for the control system 2 to operate under multiple operating conditions by adjusting the control parameters while simultaneously operating the control system 2. For example, the parameter adjustment device 1 is a terminal device such as a personal computer. Furthermore, the parameter adjustment device 1 includes an input / output unit 11 and a control unit 12 as functional components.
[0043] The input / output unit 11 is an interface for communication with the input / output device 3. The input / output unit 11 receives information from the input / output device 3 regarding setting conditions related to the adjustment of control parameters and outputs this information to the control unit 12. The information regarding the setting conditions related to the adjustment of control parameters includes: operating conditions for exploring control parameters, evaluation indicators as indicators for calculating evaluation values by the evaluation unit 122 (described later), and a combination of operating conditions and their respective evaluation costs. The evaluation indicators are indicators related to the performance of the control system 2 (e.g., the settling time and position deviation of the control system 2). Here, the settling time refers to the time required for the control system 2 to reach a permissible position that can be evaluated as the target position from the drive start position. Furthermore, the position deviation is the distance between the position of the control system 2 and the target position. Additionally, the evaluation cost is the cost required to evaluate an operating condition, such as the time required for evaluation, the energy or expense required for evaluation, and the load on the operator or the object being adjusted (control system 2).
[0044] Furthermore, the input / output unit 11 receives information from the control unit 12 regarding the results of the adjustment of the control parameters, and outputs this information to the input / output device 3. A detailed explanation of the results regarding the adjustment of the control parameters will be provided later.
[0045] The control unit 12 is implemented by a microcomputer or processor. In other words, the functions of the control unit 12 are implemented by the microcomputer or processor executing programs stored in memory.
[0046] In addition, the control unit 12 includes a condition setting unit 121, an evaluation unit 122, a comprehensive evaluation unit 123, an optimization unit 124, a selection probability update unit 125, and an action condition selection unit 126 as functional components.
[0047] The condition setting unit 121 acquires the operation conditions received by the input / output unit 11. The condition setting unit 121 sets at least one operation condition and a control parameter from the acquired operation conditions to the control system 2. Specifically, the condition setting unit 121 sets the control parameters output by the optimization unit 124 to the control system 2. Furthermore, the condition setting unit 121 sequentially applies signals based on operation conditions included in the set of operation conditions output by the operation condition selection unit 126 to the control system 2 (i.e., sets them sequentially). Furthermore, sequentially applying signals based on operation conditions to the control system 2 means that after the control system 2 completes an operation based on one operation condition, the condition setting unit 121 sets another operation condition.
[0048] The condition setting unit 121 outputs the operating conditions and control parameters set in the control system 2 to the evaluation unit 122. In addition, the condition setting unit 121 outputs the set of operating conditions set in the control system 2 for each exploration to the input / output unit 11.
[0049] The evaluation unit 122 observes and controls the motion control 2 based on the motion conditions applied by the condition setting unit 121. The evaluation unit 122 calculates an evaluation value for the observed motion control based on the evaluation indicators received from the input / output unit 11. The evaluation unit 122 outputs the combination of the motion conditions and their respective evaluation values to the comprehensive evaluation unit 123 and the selection probability update unit 125. Furthermore, the evaluation unit 122 outputs the evaluation values of the control parameters obtained under each motion condition during each exploration trial to the input / output unit 11.
[0050] The comprehensive evaluation unit 123 calculates a comprehensive evaluation value based on the combination of the action conditions output by the evaluation unit 122 and the evaluation values corresponding to those action conditions, and outputs the value to the optimization unit 124. Furthermore, the comprehensive evaluation unit 123 outputs the comprehensive evaluation value, which was previously sent to the optimization unit 124, to the input / output unit 11. A detailed explanation of the calculation method for the comprehensive evaluation value will be provided later.
[0051] Based on the comprehensive evaluation value output by the comprehensive evaluation unit 123, the optimization unit 124 uses a black-box optimization algorithm to calculate the control parameters that aim to minimize the comprehensive evaluation value (i.e., the optimal value) and outputs these control parameters to the condition setting unit 121. Furthermore, the optimization unit 124 outputs the control parameters used in each exploration trial, which were previously output to the condition setting unit 121, to the input / output unit 11. The black-box optimization algorithm can be a known algorithm such as evolutionary computation or Bayesian optimization.
[0052] The selection probability update unit 125 updates the internally maintained action condition selection probability table and cost-biased selection probability table based on the combination of action conditions and their respective evaluation values output by the evaluation unit 122, and the combination of action conditions and their respective evaluation costs received by the input / output unit 11. The selection probability update unit 125 outputs at least one of the updated selection probability table and the cost-biased selection probability table to the action condition selection unit 126. Furthermore, the selection probability update unit 125 outputs either the selection probability table or the cost-biased selection probability table output to the action condition selection unit 126 to the input / output unit 11. The selection probability table shows the probability of the action condition selection unit 126 making a selection for each action condition. The cost-biased selection probability table is a table where the probabilities of each action condition in the selection probability table are weighted according to their respective evaluation costs. A detailed explanation of the selection probability table and the cost-biased selection probability table will be provided later.
[0053] The action condition selection unit 126 uses the selection probability table output by the selection probability update unit 125 or the selection probability table with cost bias to select at least one action condition from all action conditions received by the input / output unit 11 in a probabilistic manner. The action condition selection unit 126 outputs the selected action conditions as a set of action conditions to the condition setting unit 121.
[0054] In addition, the information obtained from the components of the parameter adjustment device 1 is stored inside the input / output unit 11 and displayed on the input / output device 3 after being subjected to simple information processing.
[0055] The input / output device 3 is a device that inputs and outputs data to the parameter adjustment device 1, such as a computer, tablet computer, or smartphone. Communication between the input / output device 3 and the input / output unit 11 can be wired or wireless. Alternatively, the parameter adjustment device 1 and the input / output device 3 can be housed in the same terminal device.
[0056] [Composition of Control System 2]
[0057] Figure 2 yes Figure 1 The detailed configuration diagram of control system 2 shown is as follows.
[0058] like Figure 2 As shown, the control system 2 consists of a control unit 21 and a controlled object 22, such as a power supply device. The control system 2 performs operation control based on signals of operating conditions applied by the condition setting unit 121 and control parameters set by the condition setting unit 121.
[0059] The signal based on the operating conditions can be, for example, a target signal applied to the control unit 21, an operating mode used to switch the behavior of the control unit 21, or a disturbance signal applied to the controlled object 22. The target signal, for example, is a signal that includes information such as the target voltage output by the control system 2. The operating mode, for example, is a signal used to switch the control state of the control system 2, such as a signal that switches the control state of the power supply device to any one of constant voltage control, constant current control, or constant power control. The disturbance signal, for example, is a signal that includes information such as changes in the input voltage or load current applied to the power supply device.
[0060] The control unit 21 is implemented by a microcomputer or processor. In other words, the functions of the control unit 21 are implemented by executing programs stored in memory using a microcomputer or processor.
[0061] The control unit 21 receives control parameters, target signals, and operating modes from the condition setting unit 121. Furthermore, the control unit 21 outputs control input signals to the controlled object 22; these control input signals are signals used to control the operation of the controlled object 22. The control parameters may include, for example, the proportional gain, integral gain, and derivative gain used in PID (Proportional-Integral-Differential) control.
[0062] The controlled object 22 is the object controlled by the control unit 21, such as a power supply circuit, battery, or load. Furthermore, the controlled object 22 outputs internal signals to the evaluation unit 122. These internal signals may include information about the control quantity used by the control unit 21 to control the controlled object 22, or they may be values of signals that the control unit 21 cannot directly control or observe. Additionally, the internal signals may be control input signals applied by the control unit 21 to the controlled object 22. Specifically, the internal signals include information such as output voltage or inductor current.
[0063] The following will use a flowchart to explain the operation of the parameter adjustment device 1.
[0064] [Overall Movement]
[0065] Figure 3 This is a flowchart illustrating the operation of the parameter adjustment device 1 involved in this embodiment.
[0066] After the parameter adjustment device 1 is activated, in step S201, the user inputs the action conditions, evaluation indicators, and combinations of the action conditions and their respective evaluation costs into the input / output device 3. The input / output unit 11 receives the user's input information, outputs the action conditions to the condition setting unit 121, outputs the evaluation indicators to the evaluation unit 122, and outputs the combination of the action conditions and their respective evaluation costs to the selection probability update unit 125. The action conditions can be, for example, a combination of an action condition number and a signal applied to the control system 2, and are subsequently managed uniformly within the parameter adjustment device 1.
[0067] In step S202, the selection probability update unit 125 initializes the exploration progress, the selection probability table, and the selection probability table with cost bias.
[0068] The exploration progress (hereinafter also referred to as the exploration progress) is a scalar value maintained by the selection probability update unit 125, representing the degree of progress of the control parameter exploration performed by the parameter adjustment device. The exploration progress takes a value from 0 to 1. In addition, when the selection probability update unit 125 is initialized in step S202, the exploration progress is set to 0.
[0069] The selection probability table, maintained by the selection probability update unit 125, shows the probability that the action condition selection unit 126 will select each action condition for all action conditions output from the input / output unit 11 to the selection probability update unit 125. Furthermore, during initialization in step S202, the selection probability update unit 125 sets all values of this table to 1.
[0070] The selection probability table with cost bias is a table value maintained by the selection probability update unit 125. It is obtained by correcting the probability of each action condition set in the selection probability table based on the cost of each action condition. Specifically, the probability is corrected to be larger when the cost is greater than a certain benchmark, and the probability is corrected to be smaller when the cost is less than the benchmark. In addition, when the selection probability update unit 125 initializes in step S202, all values of this table are set to 1.
[0071] In step S203, the condition setting unit 121 sets control parameters to the control system 2 and sequentially applies all acquired action conditions to the control system 2. At this time, the control parameters set by the condition setting unit 121 can be pre-defined initial values. Thus, the control system 2 executes control sequentially based on the action conditions applied by the condition setting unit 121 and the set control parameters.
[0072] In step S204, the evaluation unit 122 observes the action waveform of the control system 2 under each action condition and calculates the evaluation value for each action condition based on the acquired evaluation index. Then, the evaluation unit 122 outputs the combination of the action conditions and their respective evaluation values to the selection probability update unit 125.
[0073] Alternatively, the parameter adjustment device 1 may also perform step S209 or step S210, which will be described later, after step S204.
[0074] In step S205, the selection probability update unit 125 updates the selection probability table and the selection probability table with cost bias based on the combination of action conditions and evaluation values obtained in step S204 or step S208 described later.
[0075] In step S206, the action condition selection unit 126 selects action conditions based on the probabilities of the values in the cost-biased selection probability table updated in step S205, creates an action condition set, and outputs it to the condition setting unit 121. Furthermore, the action condition selection unit 126 creates an action condition set that includes at least one action condition. If the action condition selection unit 126 creates an action condition set that does not include action conditions due to selecting action conditions based on the cost-biased selection probability table, it can create a new action condition set, for example, by selecting the action condition with the highest probability in the cost-biased selection probability table.
[0076] In step S207, the condition setting unit 121 sequentially applies the action conditions included in the action condition set output by the action condition selection unit 126 in step S206 to the control system 2. Thus, the control system 2 executes control sequentially based on the action conditions applied by the condition setting unit 121 and the set control parameters.
[0077] In step S208, the evaluation unit 122 observes the action waveform of the control system 2 under each action condition and calculates the evaluation value for each action condition based on the acquired evaluation index. Then, the evaluation unit 122 outputs the action condition and the combination of their respective evaluation values to the comprehensive evaluation unit 123.
[0078] In step S209, the comprehensive evaluation unit 123 obtains the maximum and minimum values of each evaluation value calculated by the evaluation unit 122 in step S208, and calculates the comprehensive evaluation value using the average of the maximum and minimum values. Alternatively, using the average of the maximum and minimum values as the comprehensive evaluation value is one example of how a comprehensive evaluation value can be calculated; a suitable weighted average of the maximum and minimum values can also be used. Furthermore, instead of using the weighted average of the maximum and minimum values, a weighted average of multiple high-order evaluation values including the maximum value and multiple low-order evaluation values including the minimum value can be used as the comprehensive evaluation value.
[0079] In step S210, the optimization unit 124 takes the comprehensive evaluation value calculated by the comprehensive evaluation unit 123 in step S209 as the objective function value that should be minimized, calculates the control parameters to be tested next based on the black-box optimization algorithm, and outputs them to the condition setting unit 121.
[0080] In step S211, the condition setting unit 121 determines whether a certain number of exploration actions for the control parameters defined in steps S205 to S210 have been performed. If the condition setting unit 121 determines that a certain number of exploration actions have not been performed ("No" in step S211), the process proceeds to step S205. Conversely, if the condition setting unit 121 determines that a certain number of exploration actions have been performed ("Yes" in step S211), the process proceeds to step S212.
[0081] In step S212, the input / output unit 11 outputs the optimal parameters calculated by the evaluation unit 122 based on the evaluation index of each selected action condition in each exploration action to the input / output device 3. The optimal parameters are defined as control parameters that give the minimum (good) value from the history of the maximum (poor) evaluation values related to the selected action conditions in each exploration action.
[0082] Alternatively, in step S205, the selection probability update unit 125 may update the selection probability table and the selection probability table with cost bias without applying cost bias (i.e., the weight corresponding to the evaluation cost), so that the selection probabilities set in the selection probability table and the selection probability table with cost bias are the same value.
[0083] In addition, the selection probability update unit 125 can also update the selection probability table in step S205, and the action condition selection unit 126 can also create an action condition set based on the selection probability table and output it in step S206.
[0084] Based on the above explanation, since the comprehensive evaluation unit 123 calculates the comprehensive evaluation value by taking the weighted average of the maximum and minimum values among multiple evaluation values, it can obtain comprehensive evaluation values with diverse values in each exploration action. Therefore, the optimization unit 124 can use the comprehensive evaluation values obtained in each exploration action as a guide to calculate the control parameters to be tested next.
[0085] Furthermore, since the comprehensive evaluation unit 123 calculates the comprehensive evaluation value by taking the weighted average of multiple evaluation values, including the maximum and minimum values, a more diverse range of comprehensive evaluation values can be obtained in each exploration action.
[0086] [Select the action of probability update unit 125]
[0087] The following will utilize Figure 4 The flowchart for Figure 3 The detailed actions performed by the probability update unit 125 in step S205 will be explained.
[0088] Figure 4 It means Figure 3 The flowchart shows the detailed actions of step S205.
[0089] In step S251, the selection probability update unit 125 determines the action condition with the maximum evaluation value based on the combination of the action condition and its respective evaluation value obtained in step S204 or S208, and updates the selection probability table in a way that makes the selection probability of this action condition higher than the selection probabilities of other action conditions. Specifically, when the selection probability of the action condition with the maximum evaluation value is ps, the selection probability update unit 125 increases the selection probability ps and updates the selection probability ps according to Equation 1 below using an appropriate positive value cp, making it higher than the selection probabilities of other action conditions.
[0090] ps←ps+cp···(Formula 1)
[0091] In step S252, the selection probability update unit 125 determines the action condition with the minimum evaluation value based on the combination of the action condition and its respective evaluation value obtained in step S204 or S208, and updates the selection probability table in such a way that the selection probability of this action condition increases or decreases according to the exploration progress. Specifically, when the selection probability of the action condition with the minimum evaluation value is ps, the selection probability update unit 125 updates the table according to Equation 2 below using appropriate positive values cp, cn, and exploration progress α. Equation 2 below is a formula that increases the selection probability of the action condition when the exploration progress α is less than a threshold, and decreases the selection probability of the action condition when the exploration progress α is greater than the threshold. Furthermore, Equation 2 below is a formula that sets the selection probability ps of the action condition when the exploration progress α is greater than the threshold lower than the selection probability ps of the action condition when the exploration progress α is less than the threshold.
[0092] ps←ps+(1-α)×cp-α×cn···(Equation 2)
[0093] In step S253, the selection probability update unit 125 updates the selection probability table by reducing the selection probability of action conditions whose evaluation values are neither the maximum nor the minimum, based on the combination of action conditions and their respective evaluation values obtained in step S204 or S208. Specifically, when the selection probability of an action condition whose evaluation value is neither the minimum nor the maximum is ps, the selection probability update unit 125 updates the selection probability ps using an appropriate positive value cn according to the following formula 3, in order to reduce the selection probability ps.
[0094] ps←ps-cn···(Formula 3)
[0095] Alternatively, the probability update unit 125 can also be executed after changing the processing order of steps S251, S252, and S253.
[0096] Furthermore, the selection probabilities updated in steps S251, S252, and S253 are values from the minimum probability ε to 1, as described later. Further, the threshold in step S252 is also a value from the minimum probability ε to 1, as described later.
[0097] In step S254, the probability update unit 125 updates the exploration progress. Specifically, the probability update unit 125 uses an update magnitude Δα to update the exploration progress α according to Equation 4 below.
[0098] α←α+Δα···(Formula 4)
[0099] In step S255, the selection probability update unit 125 updates the cost-biased selection probability table by multiplying the probability of each action condition in the selection probability table by a coefficient corresponding to the evaluation cost of each action condition. Specifically, the selection probability update unit 125 sets the selection probability of the action condition maintained in the selection probability table to ps, sets the evaluation cost involved in the action condition to C, sets the lowest evaluation cost among all action conditions to Cmin, and sets the cost-biased selection probability psc according to Equation 5 below.
[0100] psc←ps×Cmin / C···(Equation 5)
[0101] Furthermore, the selection probability update unit 125 can also adjust the value of the cost-based correction coefficient Cmin / C according to the exploration progress α. For example, if the exploration progress α exceeds a threshold, the selection probability update unit 125 can also set the correction coefficient Cmin / C to 1.
[0102] In step S256, the selection probability update unit 125 restricts the selection probability of the action condition maintained in the selection probability table and the selection probability table with cost bias to a minimum probability ε between 1 and 1. Furthermore, the minimum probability ε is a value used to prevent the selection probability from becoming completely zero and to slightly preserve the possibility that the action condition will be selected again.
[0103] Based on the above explanation, since the parameter adjustment device 1 uses the action conditions selected by the action condition selection unit 126 to perform the exploration action, it is possible to reduce the number of evaluations required for each exploration action.
[0104] Furthermore, since the probability table sets the probability of selecting multiple action conditions corresponding to evaluation values including the maximum and minimum values, the action condition selection unit 126 can select a variety of action conditions. Therefore, since the parameter adjustment device 1 performs various exploration actions using diverse action conditions, the comprehensive evaluation unit 123 can obtain a variety of comprehensive evaluation values.
[0105] Furthermore, in the early stages of exploration, since the action condition selection unit 126 is more likely to select the action condition that minimizes the value, the parameter adjustment device 1 executes exploration actions that include that action condition more frequently. Therefore, in the early stages of exploration, the optimization unit 124 can calculate the control parameters to be tested next, using a variety of comprehensive evaluation values as a guide. Moreover, in the middle and later stages of exploration, since the action condition selection unit 126 becomes less likely to select the action condition that minimizes the value, the parameter adjustment device 1 executes exploration actions that include that action condition less frequently. Therefore, the parameter adjustment device 1 can reduce the number of evaluations required for each exploration action.
[0106] Furthermore, since the action condition selection unit 126 is more likely to select action conditions with lower evaluation costs and less likely to select action conditions with higher evaluation costs, the parameter adjustment device 1 is able to suppress the evaluation costs involved in each exploration action.
[0107] Furthermore, since the action conditions selected by the action condition selection unit 126 change according to the progress of the exploration, the parameter adjustment device 1 can use a variety of action conditions during the exploration.
[0108] [The relationship between the objective function, control input, and control variable]
[0109] The following is based on Figure 5A and Figure 5B Taking the action of parameter adjustment device 1 as an example, the relationship between the objective function, control input, and control quantity obtained by the action of the parameter adjustment device 1 will be explained. Figure 5A This is a diagram showing an example of an objective function corresponding to multiple action conditions. Figure 5B It is shown in the corresponding Figure 5A The objective function includes a graph showing how the control input and control quantity change over time at each point marked by a fork from a to an i.
[0110] Figure 5A This is a diagram illustrating an example of the objective function corresponding to multiple action conditions cond0 to cond7. The objective function is a function that represents the correspondence when the horizontal axis represents the control parameters and the vertical axis represents their evaluation values. The control parameters shown on the horizontal axis are the control parameters set in the control system 2 by the condition setting unit 121, and the evaluation values shown on the vertical axis are the evaluation values calculated by the evaluation unit 122. Furthermore, Figure 5A This is a graph showing the objective function obtained when one of the control parameters is fixed while adjusting the two control parameters in parameter adjustment device 1.
[0111] The objective functions corresponding to action conditions cond0 and cond1 are objective functions with a minimum evaluation value of approximately 1. The objective functions corresponding to action conditions cond4, cond5, cond6, and cond7 are objective functions with a minimum evaluation value of approximately 3.7. The objective functions corresponding to action conditions cond2 and cond3 are objective functions with a minimum evaluation value of approximately 5. Furthermore, among the objective functions with a minimum evaluation value of approximately 1, the objective function corresponding to action condition cond0 has a control parameter value of approximately 0 when the minimum evaluation value is obtained, and the objective function corresponding to action condition cond1 has a control parameter value of approximately 0.3 when the minimum evaluation value is obtained. In the objective function with a minimum evaluation value of approximately 3.7, the objective function with a control parameter value of approximately -0.3 when obtaining the minimum evaluation value corresponds to action condition cond7; the objective function with a control parameter value of approximately 0 when obtaining the minimum evaluation value corresponds to action condition cond6; the objective function with a control parameter value of approximately 0.3 when obtaining the minimum evaluation value corresponds to action condition cond5; and the objective function with a control parameter value of approximately 0.9 when obtaining the minimum evaluation value corresponds to action condition cond4. In the objective function with a minimum evaluation value of approximately 5, the objective function with a negative control parameter value when obtaining the minimum evaluation value corresponds to action condition cond2; and the objective function with a positive control parameter value when obtaining the minimum evaluation value corresponds to action condition cond3. Furthermore, the objective function max shown by the dashed line is the objective function depicted by the largest evaluation value among the evaluation values obtained under each control parameter for the objective functions corresponding to multiple action conditions cond0 to cond7.
[0112] Cross-shaped markers a, b, and c are attached to action condition cond3. Cross-shaped markers d, e, and f are attached to action condition cond7. Cross-shaped markers g, h, and i are attached to action condition cond1. Furthermore, cross-shaped markers a, d, and g represent points where the control parameter value is 1; cross-shaped markers b, e, and h represent points where the control parameter value is 2; and cross-shaped markers c, f, and i represent points where the control parameter value is 3.
[0113] Figure 5B (a) to (i) are shown to correspond to Figure 5A The objective function includes a graph of control inputs and control quantities marked with crosses a to i. Furthermore, the control inputs are signals output from the control unit 21 to the controlled object 22. Figure 5B In the example, it is set to saturate at a minimum value of -1 and a maximum value of 1. The control quantity is the output value of the controlled object 22 that is observable by the evaluation unit 122. Figure 5B In the example, the control unit 21 controls the output value with a target value of 1.
[0114] exist Figure 5B In the example, the evaluation unit 122 calculates the integral value of the deviation between the control quantity and the target value as the evaluation value. Specifically, the evaluation unit 122 calculates the total area enclosed by the waveform representing the control quantity and the straight line representing the target value over a certain period of time as the evaluation value.
[0115] exist Figure 5B In the responses shown in (a), (b), (c), (e), (f), and (i), the waveform of the control input repeatedly saturates. In this response, even if the condition setting unit 121 slightly increases or decreases the value of the control parameter and sets it in the control system 2, the waveform of the control input will saturate at 1 or -1. Therefore, since the saturation amount in the input control is the same, the subsequent changes in the control quantity are also the same. Moreover, since the changes in the control quantity output to the evaluation unit 122 are the same, the subsequent control input waveform will also repeatedly saturate at 1 or -1. Thus, Figure 5A The objective function shown has a flat interval where the evaluation values related to the control parameters do not change.
[0116] Figure 5B Figures (a), (b), and (c) are graphs showing the evaluation value and response of the action condition cond3. As these figures show, under any control parameter, the waveform of the control input repeatedly saturates, and the change in the control quantity is oscillatory. Based on this property, Figure 5A The objective function of the action condition cond3 shown maximizes the evaluation value for most control parameters. In this objective function, the control parameters for which the response is improved and the evaluation value decreases are distributed over a very narrow range. Furthermore, action condition cond2 also has the same properties and objective function shape as action condition cond3.
[0117] Figure 5B Figures (d), (e), and (f) are graphs showing the evaluation value and response of action condition cond7. Figure 5B In (e) and (f), the waveform of the control input repeatedly saturates, and the change of the control quantity is oscillating, but... Figure 5B In (d), the waveform of the control input no longer saturates consistently, thus improving the response. Furthermore, Figure 5B The amplitude of the control quantity in (d) is less than Figure 5B The amplitude of the control quantities in (e) and (f). Furthermore, Figure 5B The amplitudes of the control quantities in (e) and (f) are also smaller than those when the control parameters are set to the same value. Figure 5B The amplitude of the control quantity in (b) and (c). Based on this property, compared with action conditions cond2 and cond3, in Figure 5AIn the objective function shown for cond7, the flat interval is narrow, while the valley interval is wide. Furthermore, the evaluation values obtained under most control parameters are lower than those obtained based on action conditions cond2 and cond3. Additionally, action conditions cond4, cond5, and cond6 also exhibit similar properties and objective function shapes to action condition cond7.
[0118] Figure 5B (g), (h), and (i) are graphs showing the evaluation value and response of action condition cond1. Figure 5B In (i), the waveform of the control input repeatedly saturates, and the change in the control quantity is oscillatory, but... Figure 5B In (g) and (h), the response is improved because the waveform of the control input no longer saturates consistently. Furthermore, Figure 5B The amplitudes of the control quantities in (g) and (h) are less than Figure 5B The amplitude of the control quantity in (i). Furthermore, Figure 5B The amplitude of the control quantity in (i) is also smaller than when the control parameters are set to the same value. Figure 5B The amplitude of the control quantity in (f). Based on this property, compared with action conditions cond4 to cond7, in Figure 5A In the objective function of cond1 shown, the flat interval of the objective function becomes further narrowed, and the interval forming the valley becomes further widened. Furthermore, the evaluation values obtained under most control parameters are further smaller than the evaluation values obtained based on action conditions cond4 to cond7. Additionally, action condition cond0 also has similar properties and objective function shape to action condition cond1.
[0119] In addition, the objective function for each action condition becomes the interval of the valley and its minimum value. Although they may differ from a local perspective, they are roughly the same from a global perspective.
[0120] also, Figure 5A The dashed line shown represents the objective function obtained by maximizing the objective function for all action conditions, which is the objective function that parameter adjustment device 1 ultimately aims to minimize. The objective function depicted by the dashed line can also be understood as the evaluation value of the action condition that constitutes the bottleneck, which is related to a certain control parameter.
[0121] Furthermore, under most control parameters, the evaluation values corresponding to action conditions cond2 and cond3 obtain the maximum values related to the action conditions. Therefore, simply put, if the parameter adjustment device 1 only performs control, evaluation and optimization for action conditions cond2 and cond3, it is equivalent to performing control, evaluation and optimization for the maximum values.
[0122] [Comparative Example]
[0123] The following will utilize Figure 6 The explanatory diagram is for... Figure 5A and Figure 5B A comparative example of the parameter adjustment device 1 for the objective function of the multiple action conditions shown will be explained.
[0124] Figure 6 The diagram illustrates an example of the shift between the selection probability of an action condition and the overall evaluation value under the following conditions: the selection probability update unit increases the selection probability of the action condition corresponding to the maximum value among the evaluation values, and the overall evaluation unit uses the maximum value among the evaluation values obtained through adjustment as the overall evaluation value. Figure 6 (a) is a graph showing the selection probability table for the number of trials (representing the number of exploration trials) of the action conditions. Furthermore, the selection probability in each trial is the selection probability set in the selection probability table with cost bias in each exploration trial. Figure 6 The left figure in (b) is a graph showing the shifts in the evaluation values of the action conditions for the number of trials representing the control parameters. Additionally, Figure 6 The right-hand figure in (b) is a graph that uses color bars to represent evaluation values. Figure 6 The evaluation value shown in the left figure of (b) is... Figure 6 The colors shown in the right-hand figure of (b) are used for display. Figure 6 (c) is shown Figure 6 (b) is a graph showing the shift of the color bars representing the maximum value (i.e., the overall evaluation value) of the individual evaluation values for each test run.
[0125] Figure 6 This diagram illustrates the operation of the parameter adjustment device 1 when the processing used to achieve the effect in this disclosure is omitted or changed. In other words, the operation performed by the parameter adjustment device 1 is: omitting... Figure 4 The processing of steps S252 and S254, in Figure 4 In step S253, the probability of selecting the action condition that takes the minimum value is also reduced. Figure 3 In step S209, the maximum value is used instead of the average value as the comprehensive evaluation value, and... Figure 4 Instead of applying cost bias in step S255, the process is changed to make the selection probabilities set in the selection probability table and the selection probability table with cost bias the same.
[0126] like Figure 6 As shown in (a), it can be seen that in the early stages of exploration, the probability of choosing all action conditions is close to 1. This is because in Figure 3In step S202, the selection probability update unit 125 initializes the selection probabilities set in the selection probability table and the selection probability table with cost bias to 1. The result is that, Figure 6 As shown in (b), in the early stages of exploration, parameter adjustment device 1 selects and evaluates almost all action conditions.
[0127] However, as Figure 6 As shown in (b), affected Figure 5A As shown, the shape of the objective function has an impact, and the evaluation values of action conditions cond2 and cond3 are always greater than the evaluation values of other action conditions. In this case, the probability update unit 125 is selected to execute. Figure 4 The processing of steps S251 and S253 (but also reduces the probability of selecting the action condition that takes the minimum value). Therefore, as... Figure 6 As shown in (a), the selection probability update unit 125 updates the selection probability table and the selection probability table with cost bias to increase the selection probability of action condition cond2 or cond3 and decrease the selection probability of other action conditions. However, the selection probability is limited to between the minimum probability ε and 1 due to the processing in step S256, so it will not take a value greater than 1 or less than the minimum probability ε.
[0128] Action condition selection unit 126 selects action conditions (i.e., execution) based on the updated selection probability table and the values of the selection probability table with cost bias. Figure 3 (The processing of step S206). Thus, as Figure 6 As shown in (b), the frequency of evaluation of action conditions other than cond2 and cond3 decreases as the exploration progresses. The result is that parameter adjustment device 1 only targets those prone to occurrence. Figure 6 The comprehensive evaluation value (i.e., the maximum value) shown in (c) constitutes the bottleneck action conditions, and the execution control, evaluation, and optimization are performed.
[0129] However, in this parameter adjustment method, it is necessary to... Figure 5A Optimizing the nearly flat objective function given the action conditions cond2 and cond3 is very difficult as an exploratory problem. In fact, as shown... Figure 6 As shown in (c), it can be seen that up to 200 trials, the comprehensive evaluation value almost always remains constant. The black-box optimization algorithm used by the optimization unit 124 is an algorithm that estimates the region of control parameters that minimizes the objective function based on the obtained evaluation value (the comprehensive evaluation value in this disclosure). Therefore, if the black-box optimization algorithm can only obtain the same evaluation value, it cannot obtain any clues and can only perform the same actions as when the optimization unit 124 performs random exploration without using the black-box optimization algorithm. As a result, in the above parameter adjustment method, the exploration of the optimal control parameters will stagnate.
[0130] [Action Example 1]
[0131] The following example, exemplified by a response to the problem of exploration stagnation, serves as action example 1. Figure 7 Please provide an explanation. Figure 7 The diagram illustrates an example of the shift between the selection probability and the overall evaluation value of an action condition under the following conditions: the selection probability update unit 125 increases the selection probability of the action condition corresponding to the evaluation value of the maximum and minimum values, and the overall evaluation unit 123 uses the average of the maximum and minimum values obtained in the evaluation value as the overall evaluation value. Figure 7 It shows that it is aimed at Figure 5A and Figure 5B An explanatory diagram illustrating the operation example of the parameter adjustment device 1 for the objective function under multiple action conditions. Additionally, Figure 7 The parameter adjustment device 1 described in the text and Figure 6 The difference between the parameter adjustment device 1 described in the text and the one described in the text is that it adds... Figure 4 The processing of steps S252 and S254, in Figure 4 In step S253, the selection probability of the action condition that takes the minimum value is not reduced, and it is changed to... Figure 3 In step S209, the average value is used as the comprehensive evaluation value.
[0132] Figure 7 (a) and (b) respectively show the relationship with Figure 6 The same content as (a) and (b). Figure 7 (c) is shown Figure 7 (b) is a graph showing the shift of the maximum value among the evaluation values of the action conditions for each number of trials. Figure 7 (d) shows the shift of the combined selection probability table, with respect to the number of trials for the number of exploratory trials representing the control parameters. Figure 4 The graph showing the progression of exploration progress updated in step S254. Furthermore, Figure 7 (e) shows the progression of the selected action conditions and their evaluation values, the number of trials for the number of exploratory trials representing the control parameters, and the comprehensive evaluation value calculated in step S209 (in... Figure 7 The graph shows the shift of color bars (the average of the maximum and minimum values).
[0133] like Figure 7 As shown in (b), affected Figure 5A The shape of the objective function is shown to affect the evaluation values of action conditions cond2 and cond3, which are larger than those of other action conditions, while the evaluation values of action conditions cond0 and cond1 are smaller than those of other action conditions. For example... Figure 7 As shown in (a), in Figure 7The number of trials when the exploration progress α is low (i.e., in the early stage of exploration) is shown in (d). The probability update unit 125 is selected to perform the trial. Figure 4 The processing of step S252. That is, with Figure 6 Unlike the case in (a), the probability update unit 125 updates the probability so that not only the selection probabilities of action conditions cond2 and cond3 increase, but also the selection probabilities of action conditions cond0 and cond1 increase. Thus, in the early stage of exploration, the selection probabilities of action conditions cond0 to cond3 remain around 1, while the selection probabilities of other action conditions decrease.
[0134] It can be seen that the selection probability of action conditions cond0 and cond1 being close to 1 remains unchanged until... Figure 7 The point in time (d) where the exploration progress α becomes 1 is approximately the 100th trial. The result is as follows: Figure 7 As shown in (b), the parameter adjustment device 1 evaluates not only the action conditions cond2 and cond3 in most exploratory actions, but also the action conditions cond0 and cond1, up to approximately 100 trials.
[0135] In this parameter adjustment method, by... Figure 4 Step S252 and Figure 3 Step S206 involves selecting and evaluating action conditions cond0 and cond1 that are likely to minimize their values, and then... Figure 3 Step S209 uses the average value to calculate the comprehensive evaluation value. Therefore, in this parameter adjustment method, Figure 7 The evaluation value shown in (c) and Figure 7 The comprehensive evaluation value shown in (e) is consistent with Figure 6 Compared to the evaluation value (overall evaluation value) shown in (c), it can be improved with fewer trial runs. In other words, in this parameter adjustment method, it is possible to achieve better results with fewer trials than... Figure 6 The parameter adjustment method described herein uses fewer trial runs to improve the evaluation values of the bottleneck action conditions cond2 and cond3. Figure 7 The evaluation value shown in (c) is as follows.
[0136] This is because by including the evaluation values of action conditions cond0 and cond1, which have fewer flat regions in the objective function, in the comprehensive evaluation value, it is possible to prevent the comprehensive evaluation value from being a constant value, and to narrow the exploration region of the control parameters calculated by the optimization unit 124 to near the true optimal solution. In fact, it can be seen that... Figure 7 The comprehensive evaluation value shown in (e) is even... Figure 7During the period when the evaluation values of action conditions cond2 and cond3 shown in (b) remained constant, they also took a variety of values.
[0137] After that, when Figure 7 When the exploration progress α shown in (d) becomes 1, Equation 2 above is equivalent to Equation 3 above. Therefore, with Figure 6 As shown in (a), the action conditions that are likely to take the minimum evaluation value, cond0 and cond1, have the same selection probability. Figure 7 The selection probabilities of action conditions cond0 and cond1 shown in (a) also decrease as the exploration progresses. As a result, as... Figure 7 As shown in (b), the frequency with which the evaluation unit 122 evaluates the action conditions cond0 and cond1 also decreases as the exploration progresses.
[0138] In this way, the parameter adjustment method can prevent the exploration from stalling due to only obtaining constant evaluation values in the early stages of exploration, and can then narrow down the scope to the action conditions that constitute the bottleneck for control, evaluation and optimization.
[0139] Furthermore, since the comprehensive evaluation unit 123 calculates the comprehensive evaluation value by taking the weighted average of the maximum and minimum values among multiple evaluation values, it can obtain comprehensive evaluation values with diverse values in each exploration action. Therefore, the optimization unit 124 can use the comprehensive evaluation values obtained in each exploration action as a guide to calculate the control parameters to be tested next. Thus, the parameter adjustment device 1 can avoid adjustment stagnation in the early stages of exploration.
[0140] Furthermore, in the early stages of exploration, since the action condition selection unit 126 is more likely to select the action condition that minimizes the value, the parameter adjustment device 1 executes exploration actions that include that action condition more frequently. Therefore, in the early stages of exploration, since the optimization unit 124 can calculate the control parameters to be tested next based on a variety of comprehensive evaluation values, the parameter adjustment device 1 can avoid adjustment stagnation. Furthermore, in the middle and later stages of exploration, since the action condition selection unit 126 becomes less likely to select the action condition that minimizes the value, the parameter adjustment device 1 executes exploration actions that include that action condition less frequently. Therefore, the parameter adjustment device 1 can reduce the number of evaluations required for each exploration action.
[0141] [Action Example 2]
[0142] However, in the parameter adjustment method described in Action Example 1, the total evaluation cost required to adjust the control parameters can sometimes increase depending on the different evaluation costs of each action condition. Specifically, since the evaluation costs of action conditions are not necessarily the same among different action conditions, actively selecting action conditions with high evaluation costs may lead to an increase in the total evaluation cost. Therefore, the parameter adjustment device 1 is required to minimize the comprehensive evaluation value with the lowest possible evaluation cost.
[0143] The following example of how to address the above problem will be used as action example 2, utilizing... Figure 8 and Figure 9 Please provide an explanation. Figure 8 This is a diagram showing an example of the evaluation cost set according to each action condition. Figure 9 This is a graph illustrating an example of the shift between the selection probability and the overall evaluation value of action conditions under the following circumstances: the selection probability update unit further executes [the process] to reduce the total evaluation cost during exploration. Figure 4 The processing of step S255. Figure 9 The evaluation cost for each action condition shown is Figure 8 The evaluation costs are shown. Furthermore... Figure 9 It shows that it is aimed at Figure 5A and Figure 5B An explanatory diagram illustrating the operation example of the parameter adjustment device 1 for the objective function under multiple action conditions. Additionally, Figure 9 (a), (b), (c), (d), and (e) respectively illustrate the relationship with Figure 7 The same content as (a), (b), (c), (d) and (e).
[0144] Assuming the evaluation of action conditions requires Figure 8 The evaluation cost is as shown. Specifically, the evaluation cost for even-numbered action conditions is 1, and the evaluation cost for other action conditions is 2.
[0145] Unlike Figure 7 The probability of choosing the odd-numbered action condition, shown in (a), shifts between 0 and 1 throughout the exploration process. Figure 9 The probability of choosing the odd-numbered action condition shown in (a) shifts between 0 and 0.5 throughout the exploration process. This is because... Figure 4 Step S255 reflects the cost-biased selection probability table updated based on Equation 5 above. That is, since the minimum evaluation cost Cmin in the action condition is 1, and the evaluation cost C of the odd-numbered action condition is 2, the selection probability update unit 125 updates the cost-biased selection probability table obtained by multiplying the selection probability of the odd-numbered action condition on the selection probability table by Cmin / C = 1 / 2 = 0.5.
[0146] The result is, as Figure 9 As shown in (b), the selection probability of action condition cond3 is lower than that of action condition cond2, which has a similar objective function. Furthermore, the selection probability of action condition cond1 is lower than that of action condition cond0, which has a similar objective function. Similarly, the selection probabilities of action conditions cond5 and cond7 are lower than those of action conditions cond4 and cond6, which have similar objective functions. However, Figure 9 The evaluation value shown in (c) improved from the early stages of exploration and was consistent with... Figure 7 The evaluation value shown in (c) has the same effect, namely, preventing stagnation in the early stages of exploration.
[0147] The result is that this parameter adjustment method can reduce the total evaluation cost during the exploration process while avoiding the evaluation of action conditions with high evaluation costs.
[0148] Furthermore, since the action condition selection unit 126 can easily select action conditions with lower evaluation costs and is less likely to select action conditions with higher evaluation costs, the parameter adjustment device 1 can suppress the evaluation costs involved in each exploration action.
[0149] Furthermore, the selection probability update unit 125 can also correct the evaluation cost based on the exploration progress α. For example, in the early stage of exploration, the selection probability update unit 125 updates the selection probability table with cost bias based on Equation 5 above to avoid evaluating action conditions with high evaluation costs. On the other hand, in the later stage of exploration, the selection probability update unit 125 corrects the evaluation cost and updates the selection probability table with cost bias in order to improve the accuracy of the control parameters, so that the action condition selection unit 126 can select many action conditions that are important for adjusting the control parameters without considering the evaluation cost.
[0150] Therefore, since the parameter adjustment device 1 can use a variety of action conditions during the exploration, the accuracy of the control parameters can be improved.
[0151] [The Relationship Between Cumulative Evaluation Costs and Evaluation Values]
[0152] The following will utilize Figure 10 The relationship between the total evaluation cost required in the comparative example, action example 1, action example 2, and other parameter adjustment methods and the maximum value among the obtained evaluation values is explained. Figure 10 This is a graph showing the relationship between the cumulative evaluation cost and the maximum value among the obtained evaluation values in each parameter adjustment method. Figure 10This is a graph showing a table where the horizontal axis represents cumulative evaluation cost and the vertical axis represents the maximum value among the obtained evaluation values. Furthermore, cumulative evaluation cost refers to the value obtained by adding up all the evaluation costs incurred in each exploration action.
[0153] Figure 10 The dashed line shows the relationship obtained when using the parameter adjustment method of the comparative example. The solid line shows the relationship obtained when using the parameter adjustment method of action example 1. The double-dash line shows the relationship obtained when using the parameter adjustment method of action example 2. The dotted line shows the action performed in the parameter adjustment method of the comparative example, where the action is omitted. Figure 4 The line representing the relationship obtained during the parameter adjustment methods in steps S251 and S256. In other words, the parameter adjustment method used to obtain the dotted line involves consistently controlling all action conditions during the exploration action and using the maximum value among the obtained evaluation values as the comprehensive evaluation value. The dashed line represents the line representing the relationship obtained when using a parameter adjustment method that performs the following processing: processing refers to... Figure 3 In step S209, the average of the maximum and minimum values obtained in the evaluation is used as the comprehensive evaluation value. In other words, in the parameter adjustment method when the dashed line is obtained, the adjustment method involves consistently controlling all action conditions during the exploration process and using the average of the maximum and minimum values obtained as the comprehensive evaluation value.
[0154] In the parameter adjustment methods for obtaining solid lines, dashed lines, and dotted lines, the average of the maximum and minimum values among the obtained evaluation values is used as the comprehensive evaluation value. In contrast, in the parameter adjustment methods for obtaining dotted lines and single-line graphs, the maximum value among the obtained evaluation values is used as the comprehensive evaluation value. For example, when parameter adjustment device 1 aims to adjust the control parameters to below 6 among the obtained evaluation values, using any of the three former parameter adjustment methods can achieve this goal with less cumulative evaluation cost than using any of the latter two parameter adjustment methods. This is because by including the evaluation values of action conditions with fewer flat regions in the objective function value in the comprehensive evaluation value, it is possible to prevent the comprehensive evaluation value from being a constant value, and the exploration region of the control parameters calculated by the optimization unit 124 can be narrowed to near the true optimal solution. Therefore, using any of the three former parameter adjustment methods can avoid adjustment stagnation in the early stages of exploration.
[0155] Furthermore, in the parameter adjustment method when a dashed line is obtained, control is performed on all action conditions in each exploration action. On the other hand, in the parameter adjustment method when a solid line is obtained, control is performed on the action conditions selected in each exploration action based on a cost-biased selection probability table, where the selection probability is set to be the same as the selection probability of each action condition set in the selection probability table. Therefore, the parameter adjustment device 1 using the latter parameter adjustment method can reduce the number of evaluations required for each exploration action compared to the parameter adjustment device 1 using the former parameter adjustment method. Thus, the parameter adjustment device 1 using the latter parameter adjustment method can better suppress the evaluation costs involved in each exploration action compared to the parameter adjustment device 1 using the former parameter adjustment method.
[0156] Furthermore, in the parameter adjustment method for obtaining the two-dot dashed line, control is applied to the action conditions selected based on a cost-biased selection probability table in each exploration action. This cost-biased selection probability table is obtained by multiplying the selection probability of each action condition set in the selection probability table by a coefficient corresponding to the evaluation cost of each action condition. Therefore, the parameter adjustment method for obtaining the two-dot dashed line differs from the parameter adjustment method for obtaining the solid line, and can reduce the total evaluation cost during exploration while avoiding evaluating action conditions with high evaluation costs. Therefore, the parameter adjustment device 1 using the former parameter adjustment method can further suppress the evaluation costs involved in each exploration action compared to the parameter adjustment device 1 using the latter parameter adjustment method.
[0157] [Image displayed on input / output device 3]
[0158] Next, we will utilize Figure 11 The image (i.e., UI diagram) displayed on the input / output device 3 is explained. Figure 11 This diagram illustrates an example of the input / output unit 11 displaying an image on the input / output device 3. (Example) Figure 11 As shown, the image includes: a file selection button 401, an action condition setting area 402, an evaluation index setting area 403, a cost setting area 404, an adjustment execution button 405, an adjustment status display area 406, an adjustment result display area 407, a comprehensive evaluation value display area 408, a maximum evaluation value display area 409, a selection probability table display area 410, a selection probability table display area with cost bias 411, and an optimal parameter display area 412. Furthermore, this image does not necessarily have to include all of the above components; it can also be an image displaying only a portion of the components.
[0159] Users can provide pre-set action conditions to the input / output unit 11 via the file selection button 401. Figure 11 In this example, the input / output unit 11 receives eight action conditions by reading a configuration file.
[0160] Users can set the action conditions output from the input / output unit 11 to the condition setting unit 121 through the action condition setting area 402. Figure 11 In the example, for each of the eight action conditions, the shape of the input signal applied by the condition setting unit 121 to the control system 2 is set by reading the setting file.
[0161] Users can set the evaluation metrics used by the evaluation unit 122 when evaluating motion waveforms through the evaluation metric setting area 403. Figure 11 In the example, the overshoot when the input signal is applied is set as the evaluation metric using a drop-down menu.
[0162] Users can set the evaluation cost of each action condition set in the action condition setting area 402 through the cost setting area 404. This setting can be done directly by the user using a bar graph as a slider UI, or it can be automatically set by the parameter adjustment device 1 based on the length of the input signal, etc.
[0163] Users can adjust the parameter adjustment device 1 by adjusting the execution button 405, based on the action conditions set in the action condition setting area 402, the evaluation indicators set in the evaluation indicator setting area 403, and the evaluation cost set in the cost setting area 404.
[0164] In the adjustment status display area 406, the following image is displayed in real time: the control parameters set by the condition setting unit 121 to the control unit 21 and the action conditions applied to the control system 2.
[0165] In the adjustment result display area 407, the following image is displayed: with the number of tests from the start of control parameter adjustment to the present as the horizontal axis and the operating conditions as the vertical axis, a heat map is shown showing the evaluation values calculated by the evaluation unit 122 for the operating conditions set by the condition setting unit 121. However, in a certain number of tests, the evaluation values for operating conditions that the condition setting unit 121 did not set for the control system 2 are displayed as blank.
[0166] In the comprehensive evaluation value display area 408, the following image is displayed: with the number of trials from the start of control parameter adjustment to the present as the horizontal axis, the comprehensive evaluation value calculated by the comprehensive evaluation unit 123 is shown in a heat map.
[0167] In the maximum evaluation value display area 409, the following image is displayed: with the number of trials from the start of control parameter adjustment to the present as the horizontal axis, a heatmap is used to show the maximum (worst) evaluation value for each exploratory trial for the action conditions.
[0168] In the selection probability table display area 410 and the selection probability table display area 411 with cost bias, the following images are displayed: with the number of trials from the start of control parameter adjustment to the present as the horizontal axis, the line graphs show the changes in the selection probability of the selection probability table maintained by the selection probability update unit 125 and the selection probability of the selection probability table with cost bias, respectively.
[0169] In the optimal parameter display area 412, the following is displayed: from the start of the control parameter adjustment to the present, the history of the maximum evaluation value related to the selected action conditions in each exploration action, the evaluation value when the minimum value was obtained, and the control parameter that gives the result.
[0170] To facilitate understanding of the correspondence between action conditions, the action condition setting area 402, cost setting area 404, and adjustment result display area 407 can also be displayed such that the vertical positions of the action conditions displayed in their respective UI diagrams are consistent.
[0171] Based on the above description, parameter adjustment device 1 can output the optimal control parameters obtained through exploration to input / output device 3.
[0172] [Variation Example]
[0173] The following describes a variation of Operation Example 2 of the implementation method. In this variation, an example is described where the correction coefficient varies according to the exploration progress α. Furthermore, the various operation conditions cond0 to 7 shown in the following description have the same characteristics as... Figure 5A and Figure 5B The multiple action conditions cond0 to 7 shown have the same objective function.
[0174] [A variation of the action of the probability update unit 125]
[0175] exist Figure 4 In step S254, the probability update unit 125 may also not use Equation 4 above, but update the exploration progress α according to Equations 6 and 7 below. Furthermore, α shown in Equation 6... k It is the estimated success rate after exploring and testing k times (where k is a natural number), Np is the number of action conditions estimated to be successful after exploring and testing k times, and N is the total number of action conditions. Furthermore, max... k It is a symbol representing the maximum value related to the number of exploration trials k.
[0176] α k ←Np / (N-1)···(Equation 6)
[0177] α←max k {α k}···(Formula 7)
[0178] The probability update unit 125 uses Equations 6 and 7 above to calculate the probability of success dependent on the estimated pass rate α. k The exploration progress α. Here, using... Figure 12 The specific example shown illustrates the calculation of the estimated pass rate. Figure 12 This is a table showing an example of the estimated pass / fail status of multiple action conditions cond0 to 7 under a certain number of exploratory trials k. Additionally, in Figure 12 In the example, the actual action conditions applied to control system 2 are cond2, cond3, cond4 and cond6.
[0179] like Figure 12 As shown in the table, the action conditions, the selection probability of each action condition under the number of exploration trials k, the qualification of the action conditions for executing control under the number of exploration trials k, and the estimated qualification of each action condition under the number of exploration trials k.
[0180] The "Pass / Fail" column indicates whether the action condition for which control was executed was passable or failable during the trial run k. For example, if the evaluation value obtained by executing a certain action condition is lower than the benchmark value, the action condition is passable; if the evaluation value obtained by executing a certain action condition is higher than the benchmark value, the action condition is failable.
[0181] In the "Approval / Disapproval" column, action conditions without control are included and their approval / disapproval is indicated. Specifically, the approval / disapproval of an action condition without control at trial number k is considered as the approval / disapproval of the action condition with the next lowest probability of selection among the action conditions with control at trial number k. For example, the approval / disapproval of cond7 is considered as the approval / disapproval (disapproval) of cond6, whose selection probability is next to cond7, while the approval / disapproval of cond0, cond1, and cond5 is considered as the approval / disapproval (approval) of cond4, whose selection probability is next to cond0, cond1, and cond5.
[0182] Therefore, in Figure 12 In a specific example, the estimated pass rate α k as follows.
[0183] α k ←4 / (8-1) = 4 / 7
[0184] In addition, Figure 4 In step S255, the selection probability update unit 125 may also set the selection probability psc with cost bias according to Equation 8 below, instead of using Equation 5 above. Furthermore, β shown in Equation 8 is a correction coefficient.
[0185] psc←ps×β···(Equation 8)
[0186] Figure 13 This is a graph showing the relationship between the exploration progress α and the correction coefficient. Figure 13 (a) is a graph showing the relationship between the exploration progress α and the cost-based correction factor Cmin / C in Action Example 2. Figure 13 (b) is a graph showing the relationship between the exploration progress α and the correction coefficient β in a variation of action example 2.
[0187] like Figure 13 As shown in (a), the cost-based correction factor Cmin / C is represented as a constant value, unaffected by the exploration progress α. For example, when the evaluation cost is 1, the correction factor Cmin / C is represented as 1; when the evaluation cost is 2, the correction factor Cmin / C is represented as 0.5.
[0188] like Figure 13 As shown in (b), for the action condition where the evaluation cost is minimized (in Figure 13 In (b), the evaluation cost is 1), and the probability update unit 125 sets the correction coefficient β to 1. For action conditions where the evaluation cost is not the minimum (in Figure 13 In (b), the evaluation cost is 2), and the selection probability update unit 125 adjusts the correction coefficient β between 0 and 1 according to the exploration progress α. That is, for action conditions where the evaluation cost is not the minimum, the selection probability update unit 125 adjusts the correction coefficient β between 0 and 1 according to the exploration progress α.
[0189] Based on the above explanation, in the variation of action example 2, the parameter adjustment device 1 sets the selection probability of the action condition, making it less likely to select action conditions with high evaluation costs in the early stages of control parameter exploration, while making it easier to select action conditions with high evaluation costs in the later stages of control parameter exploration. Therefore, the parameter adjustment device 1 can use a variety of action conditions in the exploration.
[0190] Furthermore, the correction factor β is not limited to increasing linearly with the increase of exploration progress α. For example, the correction factor β can also increase curvilinearly with the increase of exploration progress α.
[0191] [Comparison of the probabilities of choosing action conditions]
[0192] The following describes the process where the probability update unit 125 changes the correction coefficient β based on the exploration progress α, using... Figure 14 An example illustrating the relationship between the probability of action condition selection and the overall evaluation value is provided. Furthermore, Figure 14 It shows that it is aimed at Figure 5A and Figure 5BAn explanatory diagram illustrating the operation example of the parameter adjustment device 1 for the objective function under multiple action conditions. Additionally, Figure 14 (a), (b), (c), (d), and (e) respectively illustrate the relationship with Figure 9 The diagrams (a), (b), (c), (d), and (e) contain the same content. Furthermore, assume... Figure 14 The evaluation of the action conditions shown is the same as that described in Action Example 2, and requires... Figure 8 The evaluation cost is as shown.
[0193] Unlike Figure 9 The probability of choosing the odd-numbered action condition, shown in (a), shifts between 0 and 1 throughout the exploration process. Figure 14 The probability of choosing the odd-numbered action condition shown in (a) shifts around 0 in the early stages of exploration (e.g., fewer than 20 trials), and rises to around 1 in the later stages of exploration (e.g., more than 20 trials) and then shifts between 0 and 1. This is because in Figure 4 Step S255 reflects the updated selection probability table with cost bias based on Equation 8 above. Specifically, this is because the value of the correction coefficient β changes with... Figure 14 The progress α, as shown in (d), increases with the increase of exploration progress α.
[0194] The result is, as Figure 14 As shown in (b), the selection probability of action condition cond3 shifts to a lower value than that of action condition cond2 with a similar objective function when the number of trials is less than 20, but shifts to a value close to the selection probability of action condition cond2 (around 1) when the number of trials is 50 or more. Similarly, the selection probability of action condition cond1 shifts to a lower value than that of action condition cond0 with a similar objective function when the number of trials is less than 20, but shifts to a value close to the selection probability of action condition cond0 (within the range of 0 to 1) when the number of trials is 50 or more. Likewise, the selection probabilities of action conditions cond5 and cond7 shift to lower values than those of action conditions cond4 and cond6 with similar objective functions when the number of trials is less than 20, but shifts to values close to the selection probabilities of action conditions cond4 and cond6 (within the range of 0 to 1) when the number of trials is 50 or more. Thus, Figure 14 The evaluation value shown in (c) improved from the early stages of exploration and had the same characteristics as Figure 9(c) shows the same effect of evaluation value, preventing stagnation in the early stages of exploration. That is, the selection probability table is set by the selection probability update unit 125 so that the selection probability of the action condition (cond2 and cond3) among the multiple action conditions (cond0 to 7) that is not lower than the baseline value increases according to the progress of exploration.
[0195] The result is that this parameter adjustment method can reduce the total evaluation cost during the exploration process while avoiding the evaluation of action conditions with high evaluation costs.
[0196] [Comparison of the relationship between cumulative evaluation costs and evaluation values]
[0197] The following describes action example 2 and its variations using... Figure 15 Explain the relationship between cumulative evaluation costs and evaluation values. Figure 15 This is a graph showing the shift in evaluation values when considering action conditions cond2 and cond3. Figure 15 (a) is a graph showing the shift in the evaluation value obtained in action example 2. Figure 15 (b) is a graph showing the shift in the evaluation value obtained in the variation of action example 2. Figure 15 (a) and Figure 15 (b) is a graph where the horizontal axis represents cumulative evaluation cost and the vertical axis represents the evaluation value. Furthermore, Figure 15 (c) is a graph showing the progression of exploration progress α relative to cumulative evaluation costs.
[0198] contrast Figure 15 (a) and Figure 15 As shown in (b), before the cumulative evaluation cost reaches 2500, the variations of action example 2 involve more instances of parameter adjustment device 1 using action condition cond3 than action example 2. In other words, before the cumulative evaluation cost reaches 2500, the variations of action example 2 involve more instances of parameter adjustment device 1 simultaneously selecting action conditions cond2 and cond3 than action example 2. Furthermore, in action example 2, when the cumulative evaluation cost is 2500, the maximum value (dashed line) of the obtained evaluation value has not yet fallen below the baseline value (in...). Figure 15 In example (a), the value is 6.0; however, in a variation of action example 2, when the cumulative evaluation cost is approximately 2000, the maximum value (dashed line) of the obtained evaluation value is already lower than the benchmark value (in Figure 15 In example (b), it is 6.0).
[0199] In summary, compared to Action Example 2, in the variation of Action Example 2, the parameter adjustment device 1 can increase the number of times it explores and simultaneously improves the control parameters of action conditions cond2 and cond3, which are bottlenecks and are prone to maximizing. In other words, in the variation of Action Example 2, because the parameter adjustment device 1 increases the number of times action conditions cond2 and cond3, which are bottlenecks and are prone to maximizing, are selected simultaneously, it can increase the number of times it explores and simultaneously improves the control parameters of action conditions cond2 and cond3. Therefore, in the variation of Action Example 2, the parameter adjustment device 1 can explore control parameters that make the maximum value in the obtained evaluation value lower than (satisfy) the benchmark value with less evaluation cost.
[0200] [A variant example of an image displayed on input / output device 3]
[0201] Next, using Figure 16 A modified example of the image (i.e., UI image) displayed on the input / output device 3 will be described. Figure 16 It is shown Figure 11 The image shown is a deformed example. Additionally, in Figure 16 In the explanation, for Figure 11 The images shown contain the same constituent elements, which are given the same symbols and have their descriptions omitted. For elements not included in... Figure 11 The constituent elements in the image shown are then given new symbols and explained.
[0202] like Figure 16 As shown, the image is similar to Figure 11 The difference in the image shown is that it includes a qualified reference input area 413 and a qualified ratio display area 414.
[0203] The user can provide the pass / fail criteria for the set operation conditions to the input / output unit 11 through the pass / fail criteria input area 413. As a result, the parameter adjustment device 1 adjusts the evaluation value of each set operation condition to be lower than the pass / fail criteria (in the specified range). Figure 16 The exploration of control parameters (6.0 in the example).
[0204] In the qualified ratio display area 414, the following image is displayed: with the number of trials from the start of the control parameter adjustment to the present as the horizontal axis, a line graph is used to show the progress of the exploration α calculated by the selection probability update unit 125 using the above Equations 6 and 7.
[0205] Based on the above description, the parameter adjustment device 1 can receive the user's specification of the qualification standard via the input / output device 3.
[0206] [Effect]
[0207] As described above, the parameter adjustment device 1 according to this embodiment is a parameter adjustment device 1 that explores the optimal control parameters for operating the control system 2 under multiple operating conditions by adjusting control parameters while operating the control system 2. The parameter adjustment device 1 includes: a condition setting unit 121, which sets at least one operating condition and control parameters among multiple operating conditions to the control system 2; an evaluation unit 122, which calculates an evaluation value for each of the at least one operating condition, the evaluation value being an evaluation value related to the operation of the control system 2 when it operates under the at least one operating condition and control parameters set by the condition setting unit 121; a comprehensive evaluation unit 123, which calculates a comprehensive evaluation value by using a weighted average of at least the maximum and minimum values among the evaluation values calculated by the evaluation unit 122; and an optimization unit 124, which calculates control parameters for the next operation of the control system 2 based on the comprehensive evaluation value calculated by the comprehensive evaluation unit 123 using an optimization algorithm. The condition setting unit 121 sets at least one operating condition and the control parameters calculated by the optimization unit 124 to the control system 2.
[0208] With this configuration, since the comprehensive evaluation unit 123 calculates the comprehensive evaluation value by taking the weighted average of the maximum and minimum values among multiple evaluation values, a variety of comprehensive evaluation values can be obtained in each exploration action. Therefore, the optimization unit 124 can use the comprehensive evaluation values obtained in each exploration action as a guide to calculate the control parameters to be tested next. Thus, the parameter adjustment device 1 can avoid adjustment stagnation in the early stages of exploration, thereby reducing the time spent on performance evaluation through simulation or actual equipment in the automatic adjustment of control parameters for multiple action conditions.
[0209] Furthermore, in the parameter adjustment device 1 of this embodiment, the comprehensive evaluation unit 123 calculates a comprehensive evaluation value by using a weighted average obtained from multiple high-order evaluation values including the maximum value and multiple low-order evaluation values including the minimum value.
[0210] With this configuration, since the comprehensive evaluation unit 123 calculates the comprehensive evaluation value by taking the weighted average of multiple evaluation values, including the maximum and minimum values, it can obtain a more diverse range of comprehensive evaluation values for each exploration action. Therefore, the optimization unit 124 can use the comprehensive evaluation values obtained from each exploration action as a guide to calculate the control parameters to be tested next. Thus, the parameter adjustment device 1 can avoid adjustment stagnation in the early stages of exploration, thereby reducing the time spent on performance evaluation through simulation or actual equipment in the automatic adjustment of control parameters for multiple action conditions.
[0211] Furthermore, the parameter adjustment device 1 according to this embodiment also includes: a selection probability update unit 125, which updates a selection probability table based on the evaluation value calculated by the evaluation unit 122, wherein the probability of selecting an action condition with the maximum evaluation value is set to be higher than the probability of selecting other action conditions, and the probability of selecting an action condition with the minimum evaluation value is set according to the progress of the exploration; and an action condition selection unit 126, which selects at least one action condition from multiple action conditions based on the selection probability table updated by the selection probability update unit 125, and a condition setting unit 121 sets the at least one action condition selected by the action condition selection unit 126 and the control parameters calculated by the optimization unit 124 in the control system 2.
[0212] With this configuration, since the parameter adjustment device 1 performs the exploration action using the action condition selected by the action condition selection unit 126, the number of evaluations required for each exploration action can be reduced. Therefore, the parameter adjustment device 1 can reduce the time spent on performance evaluation through simulation or actual equipment in the automatic adjustment of control parameters for multiple action conditions.
[0213] Furthermore, in the parameter adjustment device 1 of this embodiment, in the selection probability table, the probability of selecting multiple high-level action conditions corresponding to multiple high-level evaluation values including the maximum value in the evaluation values is set to be higher than the probability of selecting other action conditions, and the probability of selecting multiple low-level action conditions corresponding to multiple low-level evaluation values including the minimum value in the evaluation values is set according to the progress of the exploration.
[0214] With this configuration, since the probability of selecting multiple action conditions corresponding to multiple evaluation values, including the maximum and minimum values, is set in the selection probability table, the action condition selection unit 126 can select a variety of action conditions. Therefore, since the parameter adjustment device 1 performs various exploration actions using diverse action conditions, the comprehensive evaluation unit 123 can obtain comprehensive evaluation values with diverse values. Thus, the parameter adjustment device 1 can avoid adjustment stagnation in the early stages of exploration, thereby reducing the time spent on performance evaluation through simulation or actual equipment in the automatic adjustment of control parameters for multiple action conditions.
[0215] Furthermore, in the parameter adjustment device 1 of this embodiment, in the selection probability table, when the exploration progress is greater than the threshold, the probability of selecting the action condition with the minimum evaluation value is set to be lower than when the exploration progress is less than the threshold.
[0216] With this configuration, in the early stages of exploration, since the action condition selection unit 126 is more likely to select the action condition that minimizes the value, the parameter adjustment device 1 can execute exploration actions including that action condition more frequently. Therefore, in the early stages of exploration, since the optimization unit 124 can calculate the control parameters to be tested next based on a variety of comprehensive evaluation values, the parameter adjustment device 1 can avoid adjustment stagnation. Furthermore, in the middle and later stages of exploration, since the action condition selection unit 126 becomes less likely to select the action condition that minimizes the value, the number of times the parameter adjustment device 1 executes exploration actions including that action condition decreases. Therefore, the parameter adjustment device 1 can reduce the number of evaluations required for each exploration action. Thus, the parameter adjustment device 1 can reduce the time spent on performance evaluation through simulation or actual equipment in the automatic adjustment of control parameters for multiple action conditions.
[0217] Furthermore, in the parameter adjustment device 1 of this embodiment, the selection probability update unit 125 further updates the selection probability table with cost bias after assigning evaluation cost weights to the probabilities set in the selection probability table, and the action condition selection unit 126 further selects at least one action condition based on the cost bias selection probability table updated by the selection probability update unit 125.
[0218] With this configuration, the action condition selection unit 126 can more easily select action conditions with lower evaluation costs and is less likely to select action conditions with higher evaluation costs, thus the parameter adjustment device 1 can suppress the evaluation costs involved in each exploration action. Furthermore, since the parameter adjustment device 1 performs exploration actions using the action conditions selected by the action condition selection unit 126, the number of evaluations required for each exploration action can be reduced. Therefore, the parameter adjustment device 1 can reduce the time spent on performance evaluation through simulation or actual equipment in the automatic adjustment of control parameters for multiple action conditions.
[0219] Furthermore, in the parameter adjustment device 1 of this embodiment, the selection probability update unit 125 corrects the evaluation cost according to the progress of the exploration.
[0220] With this configuration, the accuracy of the control parameters can be improved because the parameter adjustment device 1 can be used under various action conditions during the exploration.
[0221] Furthermore, the parameter adjustment device 1 according to this embodiment also includes an input / output unit 11. The input / output unit 11 receives information about setting conditions related to the adjustment of control parameters from the input / output device 3, and outputs information about the results related to the adjustment of control parameters to the input / output device 3. At least one of the condition setting unit 121, evaluation unit 122, comprehensive evaluation unit 123 and optimization unit 124 operates using the information about setting conditions received by the input / output unit 11. The results related to the adjustment of control parameters include at least one operating condition set by the condition setting unit 121 in the control system 2 and the control parameters calculated by the optimization unit 124.
[0222] With this configuration, the parameter adjustment device 1 can output the optimal control parameters obtained through exploration to the input / output device 3.
[0223] Furthermore, in the parameter adjustment device 1 involved in the embodiment, the selection probability update unit 125 determines whether the evaluation value is qualified or not based on whether the evaluation value calculated by the evaluation unit 122 is lower than the benchmark value, and updates the exploration progress based on whether the evaluation value is qualified or not.
[0224] By configuring the parameter adjustment device 1 to increase the selection probability table of action conditions that are not lower than the benchmark value among multiple action conditions as the exploration progresses, the number of times multiple action conditions that are bottlenecks and are prone to maximizing are selected simultaneously can be increased. In other words, the parameter adjustment device 1 can increase the number of times control parameters that simultaneously improve action conditions that are not lower than the benchmark value are explored. Therefore, the parameter adjustment device 1 can explore control parameters that cause the maximum value among the obtained evaluation values to be lower than the benchmark value with less evaluation cost.
[0225] Furthermore, the parameter adjustment method involved in this embodiment is a computer-executed parameter adjustment method. By adjusting the control parameters while causing the control system 2 to operate, the optimal control parameters for causing the control system 2 to operate under multiple operating conditions are explored. The parameter adjustment method includes: a condition setting step, in which at least one operating condition and control parameters from multiple operating conditions are set in the control system 2; an evaluation step, in which an evaluation value is calculated, which is an evaluation value related to the operation of the control system 2 under the at least one operating condition and control parameters set in the condition setting step; a comprehensive evaluation step, in which a weighted average value obtained by using at least the maximum and minimum values among the evaluation values calculated in the evaluation step is used as a comprehensive evaluation value; and an optimization step, in which the control parameters for causing the control system 2 to operate next are calculated using an optimization algorithm based on the comprehensive evaluation value calculated in the comprehensive evaluation step. In the condition setting step, at least one operating condition and the control parameters calculated in the optimization step are set in the control system 2.
[0226] This structure allows for the calculation of a diverse range of comprehensive evaluation values in the comprehensive evaluation step. Since the weighted average of the maximum and minimum values from multiple evaluations is used as the comprehensive evaluation value, a variety of values can be obtained for each exploratory action. Therefore, in the optimization step, the comprehensive evaluation values obtained from each exploratory action can be used as a guide to calculate the control parameters to be tested next. Consequently, this parameter adjustment method avoids adjustment stagnation in the early stages of exploration, thereby reducing the time spent on performance evaluation through simulation or actual equipment in the automatic adjustment of control parameters for multiple action conditions.
[0227] Furthermore, the program involved in this embodiment is used to cause a computer to execute a parameter adjustment method.
[0228] According to this procedure, computers can reduce the time required for performance evaluation through simulation or actual equipment by automatically adjusting control parameters for multiple action conditions.
[0229] (Other implementation methods)
[0230] The parameter adjustment device and the like involved in one or more embodiments have been described above, but this disclosure is not limited to the above embodiments. Various modifications to the above embodiments that can be conceived by those skilled in the art without departing from the spirit of this disclosure, as well as forms constructed by combining the constituent elements of different embodiments, may also be included within the scope of one or more embodiments.
[0231] Furthermore, in the above embodiments, the parameter adjustment device and the like can be used for automatic adjustment of control parameters in simulation as well as for automatic adjustment of control parameters in actual equipment.
[0232] Furthermore, in the above embodiments, each component can be constructed by dedicated hardware or implemented by executing software programs suitable for each component. Each component can also be implemented by a program execution unit such as a CPU or processor reading and executing software programs recorded on a recording medium such as a hard disk or semiconductor memory.
[0233] Furthermore, some or all of the functions of the parameter adjustment device involved in the above embodiments can also be implemented by a processor such as a CPU executing a program.
[0234] Some or all of the constituent elements of the aforementioned devices can be constituted by IC cards or individual modules that can be installed and removed from each device. The aforementioned IC cards or modules are computer systems composed of microprocessors, ROM, RAM, etc. The aforementioned IC cards or modules may include multi-function LSIs. The microprocessors operate according to computer programs, thereby enabling the aforementioned IC cards or modules to perform their respective functions. The aforementioned IC cards or modules may also possess tamper-proof features.
[0235] Industrial availability
[0236] The parameter adjustment device disclosed herein is useful, for example, as a device for exploring optimal control parameters.
[0237] Symbol Explanation
[0238] 1. Parameter adjustment device
[0239] 11 Input / Output Section
[0240] 12 Control Department
[0241] 121 Condition Setting Department
[0242] 122 Evaluation Department
[0243] 123 Comprehensive Evaluation Department
[0244] 124 Optimization Department
[0245] 125 Selection Probability Update Department
[0246] 126 Action Condition Selection Section
[0247] 2 Control System
[0248] 21 Control Department
[0249] 22 Controlled Objects
[0250] 3 Input / output devices
[0251] 401 File Selection Button
[0252] 402 Action Condition Setting Area
[0253] 403 Evaluation Indicator Setting Area
[0254] 404 Cost Setting Area
[0255] 405 Adjust the execution button
[0256] 406 Adjustment status display area
[0257] 407 Adjustment results display area
[0258] 408 Comprehensive Evaluation Value Display Area
[0259] 409 Maximum rating value display area
[0260] 410 Select the probability table display area
[0261] 411 Selection probability table display area with cost bias
[0262] 412 Optimal parameter display area
[0263] 413 Qualification Standard Input Area
[0264] 414 Qualified ratio display area.
Claims
1. A parameter adjustment device that explores the optimal control parameters for the control system to operate under multiple operating conditions by adjusting control parameters while simultaneously actuating the control system. The parameter adjustment device includes: The condition setting unit sets at least one of the plurality of action conditions and the control parameter in the control system; The evaluation unit calculates an evaluation value according to the at least one operating condition, the evaluation value being an evaluation value related to the operation of the control system when it operates under the at least one operating condition set by the condition setting unit and the control parameters; The comprehensive evaluation department calculates the comprehensive evaluation value using a weighted average of at least the maximum and minimum values among the evaluation values calculated by the evaluation department. as well as The optimization unit, based on the comprehensive evaluation value calculated by the comprehensive evaluation unit, uses an optimization algorithm to calculate the control parameters for the next action of the control system. The condition setting unit sets the at least one action condition and the control parameters calculated by the optimization unit in the control system.
2. The parameter adjustment device as described in claim 1, The comprehensive evaluation unit calculates the comprehensive evaluation value by using a weighted average of the evaluation values, including the highest value and the lowest value, as well as the evaluation values including the lowest value.
3. The parameter adjustment device as described in claim 1, The parameter adjustment device also includes: The selection probability update unit updates a selection probability table based on the evaluation value calculated by the evaluation unit. In this table, the probability of selecting an action condition with the highest evaluation value is set higher than the probability of selecting other action conditions, and the probability of selecting an action condition with the lowest evaluation value is set according to the progress of the exploration. The action condition selection unit selects at least one action condition from the plurality of action conditions based on the selection probability table updated by the selection probability update unit. The condition setting unit sets the at least one action condition selected by the action condition selection unit and the control parameters calculated by the optimization unit in the control system.
4. The parameter adjustment device as described in claim 3, In the selection probability table, the probability of selecting multiple action conditions corresponding to multiple high-order evaluation values, including the maximum value, is set to be higher than the probability of selecting other action conditions, and the probability of selecting multiple action conditions corresponding to multiple low-order evaluation values, including the minimum value, is set according to the progress of the exploration.
5. The parameter adjustment device as described in claim 3, In the selection probability table, when the exploration progress is greater than the threshold, the probability of selecting the action condition with the minimum evaluation value is set lower compared to when the exploration progress is less than the threshold.
6. The parameter adjustment device as described in any one of claims 3 to 5, The selection probability update unit further updates the selection probability table by assigning evaluation cost weights to the probabilities set in the selection probability table, thus creating a cost-biased selection probability table. The action condition selection unit further selects at least one action condition based on the cost-biased selection probability table updated by the selection probability update unit.
7. The parameter adjustment device as described in claim 6, The selection probability update unit corrects the evaluation cost based on the progress of the exploration.
8. The parameter adjustment device as described in claim 7, The parameter adjustment device further includes an input / output unit, which receives information from the input / output device regarding setting conditions related to the adjustment of the control parameter, and outputs information to the input / output device regarding the result related to the adjustment of the control parameter. At least one of the condition setting unit, the evaluation unit, the comprehensive evaluation unit, and the optimization unit operates using information about the set conditions received by the input / output unit. The results related to the adjustment of the control parameters include the at least one operating condition set by the condition setting unit in the control system and the control parameters calculated by the optimization unit.
9. The parameter adjustment device as described in claim 6, The selection probability update unit determines whether the evaluation value is qualified or not based on whether the evaluation value calculated by the evaluation unit is lower than the benchmark value. The progress of the exploration is updated based on whether the evaluation value is qualified or not.
10. The parameter adjustment device as described in claim 9, The parameter adjustment device further includes an input / output unit, which receives information from the input / output device regarding setting conditions related to the adjustment of the control parameter, and outputs information to the input / output device regarding the result related to the adjustment of the control parameter. At least one of the condition setting unit, the evaluation unit, the comprehensive evaluation unit, and the optimization unit operates using information about the set conditions received by the input / output unit. The results related to the adjustment of the control parameters include the at least one operating condition set by the condition setting unit in the control system and the control parameters calculated by the optimization unit.
11. The parameter adjustment device as described in claim 7, The selection probability update unit determines whether the evaluation value is qualified or not based on whether the evaluation value calculated by the evaluation unit is lower than the benchmark value. The progress of the exploration is updated based on whether the evaluation value is qualified or not.
12. The parameter adjustment device as described in claim 11, The parameter adjustment device further includes an input / output unit, which receives information from the input / output device regarding setting conditions related to the adjustment of the control parameter, and outputs information to the input / output device regarding the result related to the adjustment of the control parameter. At least one of the condition setting unit, the evaluation unit, the comprehensive evaluation unit, and the optimization unit operates using information about the set conditions received by the input / output unit. The results related to the adjustment of the control parameters include the at least one operating condition set by the condition setting unit in the control system and the control parameters calculated by the optimization unit.
13. A computer-executed parameter adjustment method, which explores the optimal control parameters for the control system to operate under multiple operating conditions by adjusting control parameters while simultaneously initiating the operation of the control system. The parameter adjustment method includes: The condition setting step involves setting at least one of the plurality of action conditions and the control parameter in the control system. The evaluation step involves calculating an evaluation value, which is an evaluation value related to the action of the control system when it performs an action under the at least one action condition and the control parameters set in the condition setting step. The comprehensive evaluation step uses a weighted average of at least the maximum and minimum values among the evaluation values calculated in the evaluation step as the comprehensive evaluation value. as well as The optimization step involves using an optimization algorithm, based on the comprehensive evaluation value calculated in the comprehensive evaluation step, to calculate the control parameters for the next action of the control system. In the condition setting step, the at least one action condition and the control parameters calculated in the optimization step are set in the control system.
14. A program for causing the computer to perform the parameter adjustment method of claim 13.