Rolling mill screwdown control method based on q-learning algorithm
By using the Q-learning algorithm to optimize PID parameters in the rolling mill reduction control system, the problem that existing PID controllers cannot simultaneously achieve target value tracking and interference suppression has been solved. This has enabled high-precision control of the rolling force of the rolling mill, improving the thickness and product quality of ultra-thin stainless steel strips.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SHANXI TAIGANG STAINLESS STEEL PRECISION STRIP CO LTD
- Filing Date
- 2024-02-06
- Publication Date
- 2026-06-26
AI Technical Summary
Existing PID controllers in rolling mill hydraulic automatic thickness control systems cannot simultaneously optimize target value tracking and disturbance suppression characteristics. This results in the dynamic and static characteristics of the rolling mill reduction control loop failing to meet expectations, affecting the thickness accuracy and product quality of ultra-thin stainless steel strips.
The Q-learning algorithm is used to pre-acquire the PID parameters of the rolling mill reduction control system. By using the Q-value table and the preset learning algorithm, the output of the PID controller is adjusted to optimize the rolling force control, thereby achieving the optimization of target value tracking and disturbance suppression.
It improves the control and adjustment accuracy of the rolling force of the rolling mill, ensures the adjustment accuracy of the loaded roll gap, and enhances the thickness control accuracy and product quality of ultra-thin stainless steel strip.
Smart Images

Figure CN117960798B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of rolling mill control technology, and in particular to a rolling mill reduction control method based on the Q-learning algorithm. Background Technology
[0002] For ultra-thin stainless steel strips, especially those 0.015mm and below, thickness accuracy control is the most important indicator for production processes and energy consumption. During the rolling process, the thickness of stainless steel is controlled by the hydraulic automatic thickness control system of the rolling mill. Therefore, the hydraulic automatic thickness control system of the rolling mill is the core part of the rolling mill, and its performance directly affects the production accuracy and product quality of the rolling mill. It is also a key technology affecting the thinning of strip and the accuracy of strip shape control.
[0003] A typical hydraulic automatic thickness control system consists of hydraulic, control, mechanical, and electrical systems. In actual rolling, the thickness of the strip is affected by a combination of factors, including the accuracy of the hydraulic servo system, the control accuracy of the automatic control system, instantaneous process conditions, and random changes in the properties of the incoming material. To produce strip products that meet the thickness accuracy requirements, it is necessary not only to correctly set the initial unloaded roll gap, but also to adjust the roll gap accurately and promptly as rolling conditions change during the rolling process to ensure the thickness of the ultra-thin stainless steel strip. All of this is achieved by setting and adjusting the pressing position.
[0004] Currently, the control methods of various pressing control systems in the hydraulic plate thickness automatic control system are usually PID control. Based on the system model and various performance indicators, when the controller is determined to be in PID form, the PID parameters of the PID controller are adjusted so that the dynamic and static characteristics of the control loop composed of the controlled object, controller, actuator, and feedback element meet the expected level, thereby achieving the ideal control objective.
[0005] However, in practical applications, the PID parameters of the PID controller used are usually not optimal, making it difficult to simultaneously achieve optimal target value tracking characteristics and disturbance suppression characteristics. This results in the dynamic and static characteristics of the control loop failing to meet the expected levels, and thus failing to achieve the ideal control objective. Summary of the Invention
[0006] To address some or all of the technical problems existing in the prior art, this invention provides a rolling mill reduction control method based on the Q-learning algorithm.
[0007] The technical solution of the present invention is as follows:
[0008] A rolling mill reduction control method based on the Q-learning algorithm is provided, the method comprising:
[0009] Obtain the target and output values of the rolling force from the rolling mill reduction control system;
[0010] Based on the target value and output value of the rolling force, the PID parameters of the PID controller are determined according to the three PID parameter Q value tables, which are obtained in advance using a preset Q learning algorithm.
[0011] Based on the obtained target and output values of the rolling force, as well as the PID parameters, the output of the PID controller is obtained;
[0012] The output of the PID controller is superimposed on the input of the mill reduction control system to adjust the output value of the rolling force of the mill reduction control system.
[0013] In some possible implementations, the PID parameter Q-value table is obtained using the following method:
[0014] Step S201: Initialize the Q value table of the three PID parameters;
[0015] Step S202, set the iteration variable eps = 1;
[0016] Step S203: Determine the rolling force state at the initial moment, and take the rolling force state at the initial moment as the current rolling force state. The rolling force state includes the target value and output value of the rolling force.
[0017] Step S204: Calculate and determine the detection rate ε of the ε-greedy strategy using a preset decay control strategy;
[0018] Step S205: Set the time step variable t = 0;
[0019] Step S206: Determine whether t is greater than or equal to a preset threshold. If yes, proceed to step S213. If no, increase the value of t by 1 and continue to the next step.
[0020] Step S207: Select an action based on the current rolling force state using the ε-greedy strategy. The action includes three PID parameters.
[0021] Step S208: Adjust the PID controller according to the selected action, and determine the output value of the rolling force after adjustment by the PID controller;
[0022] Step S209: Determine the next rolling force state and calculate the instantaneous reward using a preset reward strategy;
[0023] Step S210: Adaptively adjust the learning rates corresponding to the three PID parameters using a preset learning rate adjustment strategy.
[0024] Step S211: Update the Q values of the three current rolling force state-PID parameter pairs using the Bellman equation;
[0025] Step S212: Take the next rolling force state obtained as the current rolling force state and return to step S206;
[0026] Step S213: Determine whether eps is greater than or equal to the preset maximum number of iterations. If yes, output the Q value table after updating the Q value. If no, increase the value of eps by 1 and return to step S203.
[0027] In some possible implementations, the decay control strategy is defined as:
[0028]
[0029] Where e0 represents the pre-set judgment threshold.
[0030] In some possible implementations, the reward policy is defined as:
[0031]
[0032] Where, r t Indicates an immediate reward, F n (t+1) represents the output value of the rolling force at step t+1, F n (t) represents the output value of the rolling force at step t, F nref ΔF represents the target value of the given rolling force, and ΔF represents the given rolling force threshold.
[0033] In some possible implementations, the learning rate adjustment strategy is defined as:
[0034]
[0035] Where, α t+1 Let α represent the learning rate at step t+1. t Let Δα represent the learning rate at step t. t δ represents the learning rate adjustment, k is a given value (k > 0), Φ represents a given value corresponding to the discount factor (0 < Φ < 1), and δ t δ represents the time difference error at step t. t-1 This represents the time difference error at step t-1.
[0036] In some possible implementations, the method further includes:
[0037] The rolling force is discretized to divide the rolling force value into multiple intervals. The rolling force belonging to the same interval is regarded as the same state and is controlled by the same set of PID parameters.
[0038] In some possible implementations, the method is used for the reduction control process of a 20-roll reversible rolling mill.
[0039] In some possible implementations, the method is used for the reduction control process in the production of stainless steel with a thickness of less than 0.015 mm on a 20-roll reversible rolling mill.
[0040] The main advantages of the technical solution of this invention are as follows:
[0041] The rolling mill reduction control method based on the Q-learning algorithm of the present invention obtains the optimal PID parameters of the PID controller corresponding to different rolling force input and output states by pre-learning the Q-learning algorithm. It can achieve target value tracking characteristics and disturbance suppression characteristics while optimizing control, improve the control and adjustment accuracy of rolling force of the rolling mill, and ensure the adjustment accuracy of the loaded roll gap. Attached Figure Description
[0042] The accompanying drawings, which are included to provide a further understanding of embodiments of the invention and constitute a part of this invention, illustrate exemplary embodiments of the invention and, together with their description, serve to explain the invention and do not constitute an undue limitation thereof. In the drawings:
[0043] Figure 1 This is a flowchart of a rolling mill reduction control method based on the Q-learning algorithm according to an embodiment of the present invention;
[0044] Figure 2 This is a schematic diagram of the Q-learning algorithm according to an embodiment of the present invention. Detailed Implementation
[0045] To make the objectives, technical solutions, and advantages of this invention clearer, the technical solutions of this invention will be clearly and completely described below in conjunction with specific embodiments and corresponding drawings. Obviously, the described embodiments are only a part of the embodiments of this invention, and not all of them. All other embodiments obtained by those skilled in the art based on the embodiments of this invention without creative effort are within the scope of protection of this invention.
[0046] The technical solutions provided by the embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
[0047] refer to Figure 1-2 An embodiment of the present invention provides a rolling mill reduction control method based on the Q-learning algorithm, the method comprising the following steps:
[0048] Step S1: Obtain the target value and output value of the rolling force of the rolling mill reduction control system.
[0049] In one embodiment of the present invention, the target value and output value of the rolling force of the rolling mill reduction control system are obtained based on the actual rolling mill reduction control situation.
[0050] Step S2: Based on the acquired target value and output value of the rolling force, determine the PID parameters of the PID controller according to the three PID parameter Q value table. The PID parameter Q value table is obtained in advance using a preset Q learning algorithm.
[0051] In one embodiment of the present invention, the PID parameter Q-value tables store different rolling force states-PID parameter pairs and their corresponding Q values. Based on the determined target value and output value of the rolling force, the corresponding PID parameters of the PID controller are extracted from the three PID parameter Q-value tables respectively. Specifically, the PID parameters include proportional coefficient, integral coefficient, and derivative coefficient, and the three PID parameter Q-value tables store the Q values corresponding to the proportional coefficient, integral coefficient, and derivative coefficient, respectively.
[0052] In one embodiment of the present invention, the Q-value table of three PID parameters is obtained in advance by using a preset Q-learning algorithm.
[0053] Step S3: Based on the obtained target value and output value of the rolling force, as well as the PID parameters, obtain the output of the PID controller.
[0054] Specifically, the PID parameters of the PID controller used in the rolling mill reduction control system are set to the PID parameters obtained in step S2 above. The error between the target value and the output value of the rolling force is input into the PID controller. The PID controller performs proportional-integral-derivative adjustment on the input error value to obtain the corresponding output.
[0055] Step S4: The output of the PID controller is superimposed on the input of the mill reduction control system to adjust the output value of the rolling force of the mill reduction control system.
[0056] Specifically, the output of the PID controller obtained in step S3 above is superimposed on the input of the rolling mill reduction control system so that the rolling mill reduction control system adjusts the output rolling force, thereby improving the control and adjustment accuracy of the rolling force.
[0057] The rolling mill reduction control method based on the Q-learning algorithm provided in one embodiment of the present invention pre-learns the optimal PID parameters of the PID controller corresponding to different rolling force input and output states by using the Q-learning algorithm. It can achieve target value tracking characteristics and disturbance suppression characteristics while optimizing control, improve the control and adjustment accuracy of rolling force of the rolling mill, and ensure the adjustment accuracy of the loaded roll gap.
[0058] refer to Figure 2 In one embodiment of the present invention, the PID parameter Q value table is obtained using the following method:
[0059] Step S201: Initialize the Q value table of the three PID parameters;
[0060] Step S202, set the iteration variable eps = 1;
[0061] Step S203: Determine the rolling force state at the initial moment, and take the rolling force state at the initial moment as the current rolling force state. The rolling force state includes the target value and output value of the rolling force.
[0062] Step S204: Calculate and determine the detection rate ε of the ε-greedy strategy using a preset decay control strategy;
[0063] Step S205: Set the time step variable t = 0;
[0064] Step S206: Determine whether t is greater than or equal to a preset threshold. If yes, proceed to step S213. If no, increase the value of t by 1 and continue to the next step.
[0065] Step S207: Select an action based on the current rolling force state using the ε-greedy strategy. The action includes three PID parameters.
[0066] Step S208: Adjust the PID controller according to the selected action, and determine the output value of the rolling force after adjustment by the PID controller;
[0067] Step S209: Determine the next rolling force state and calculate the instantaneous reward using a preset reward strategy;
[0068] Step S210: Adaptively adjust the learning rates corresponding to the three PID parameters using a preset learning rate adjustment strategy.
[0069] Step S211: Update the Q values of the three current rolling force state-PID parameter pairs using the Bellman equation;
[0070] Step S212: Take the next rolling force state obtained as the current rolling force state and return to step S206;
[0071] Step S213: Determine whether eps is greater than or equal to the preset maximum number of iterations. If yes, output the Q value table after updating the Q value. If no, increase the value of eps by 1 and return to step S203.
[0072] In one embodiment of the present invention, by learning the PID parameters in the above manner, a Q-value table containing the optimal PID parameters under different rolling force states can be obtained.
[0073] In one embodiment of the present invention, the Q values in the three PID parameter Q value tables are all initialized to 0.
[0074] Furthermore, considering that in practical applications, the state values of the applied rolling force are usually continuous and numerous, in one embodiment of the present invention, the rolling force is discretized to divide the rolling force value into multiple intervals. The rolling force belonging to the same interval is regarded as the same state and is controlled using the same set of PID parameters.
[0075] Specifically, in one embodiment of the present invention, the interval division criterion of rolling force is defined as follows:
[0076]
[0077] Where n represents the interval to which the rolling force belongs, [·] represents rounding, and x con Indicates continuous rolling force, x min and x max These represent the lower and upper limits of the rolling force, respectively. M represents the number of intervals into which the rolling force is divided within the set rolling force range. For rolling forces less than the minimum rolling force within the set rolling force range, the rolling force is divided into the same interval, i.e., interval 0. For rolling forces greater than the maximum rolling force within the set rolling force range, the rolling force is divided into the same interval, i.e., interval M+1. M is set according to actual needs. In one embodiment of the present invention, it is set to 20.
[0078] Furthermore, in one embodiment of the present invention, the ε-greedy strategy adopted is a conventional random strategy. The ε-greedy strategy balances exploration and exploitation, and selects the action that maximizes the value function with a probability of 1-ε. The selection probability of other actions is the same, which is ε / K, where ε represents the exploration rate, the value of ε is between 0 and 1, and K represents the number of actions. The larger the value of ε, the greater the probability of selecting a random action.
[0079] The ε-greedy strategy is specifically defined as follows:
[0080]
[0081] Where A represents the selected action. Let ζ represent the action a that makes the Q-value function Q(s,a) reach its maximum value, and let ζ∈[0,1] represent a normally distributed random number.
[0082] In one embodiment of the present invention, when selecting three PID parameters using the ε-greedy strategy, the ε-greedy strategy is used independently for each different PID parameter.
[0083] Furthermore, to accelerate the convergence speed of the PID parameter Q-value table during training, improve training efficiency, and reduce training costs, in one embodiment of the present invention, during training, a preset decay control strategy is used to calculate and determine the detection rate ε of the ε-greedy strategy, so that the value of the detection rate ε decreases as the number of training iterations increases, and becomes 0 after the number of iterations reaches a certain value.
[0084] Specifically, the decay control strategy is defined as follows:
[0085]
[0086] Where e0 represents the preset judgment threshold, which is set according to the actual training situation. For example, e0 = 0.6 * max eps, where max eps represents the preset maximum number of iterations.
[0087] Furthermore, in one embodiment of the present invention, the instantaneous reward is divided into three cases based on the status of the rolling mill reduction control system: the adjusted rolling force tends to the target value, the adjusted rolling force is far from the target value, and the adjusted rolling force remains basically unchanged.
[0088] Specifically, in one embodiment of the present invention, if the difference between the output value of the adjusted rolling force and the target value of the rolling force is much smaller than the difference between the output value of the rolling force before adjustment and the target value of the rolling force, then the current situation is regarded as the adjusted rolling force tending towards the target value, and at this time the absolute value of the difference between the output values of the rolling force before and after adjustment is used as an immediate reward.
[0089] If the difference between the adjusted rolling force output value and the target rolling force value is much greater than the difference between the unadjusted rolling force output value and the target rolling force value, then the current situation is considered as the adjusted rolling force being far from the target value. In this case, the negative value of the absolute value of the difference between the rolling force output values before and after adjustment is used as the immediate reward.
[0090] If the difference between the adjusted rolling force output value and the target rolling force value is small and the difference between the original rolling force output value and the target rolling force value is small, then the current situation is considered as the rolling force remaining basically unchanged after adjustment, and 0 is used as the immediate reward.
[0091] Based on the above settings, the reward strategy is defined as follows:
[0092]
[0093] Where, r t Indicates an immediate reward, F n (t+1) represents the output value of the rolling force at step t+1, that is, the output value of the adjusted rolling force at step t, F n(t) represents the output value of the rolling force at step t, that is, the output value of the rolling force before adjustment at step t, F nref ΔF represents the target value of the given rolling force, and ΔF represents the given rolling force threshold. The target value of the rolling force is F. nref The rolling force threshold ΔF is set according to the actual situation, for example, ΔF is set to 20N (Newtons).
[0094] Furthermore, in one embodiment of the present invention, in order to make the Q-value table stabilize as soon as possible, the learning rate is adaptively adjusted using a preset learning rate adjustment strategy at each time loop step in the training process.
[0095] Specifically, in one embodiment of the present invention, the learning rate adjustment strategy is defined as:
[0096]
[0097] Where, α t+1 Let α represent the learning rate at step t+1. t Let Δα represent the learning rate at step t. t This represents the learning rate adjustment, where k is a given value (k > 0), and Φ represents a given value corresponding to the discount factor (0 < Φ < 1). The specific values of k and Φ are set according to the actual situation. δ t δ represents the time difference error (TD error) at step t. t-1 This represents the time difference error at step t-1.
[0098] In one embodiment of the present invention, δ t Calculate using the following formula:
[0099] δ t =r t +γmax Q(S t+1 ,a t+1 )-Q(S t ,a t )
[0100] Where, r t Indicates that in state S t Next, execute action a t The immediate reward obtained afterward, γ represents the discount factor, 0≤γ<1, max Q(S) t+1 ,a t+1 ) represents the next state S t+1 The maximum Q value that can be obtained is Q(S). t ,a t ) represents state S t -Action a t For the corresponding Q value.
[0101] In one embodiment of the present invention, when adaptively adjusting the learning rate, the learning rate is updated based on the comparison result between the TD error of the current step and the cumulative TD error of the previous steps. Specifically, when the learning rate is large, the sign is changed so that it is lowered in the next learning iteration; if the learning rate is small, it is continuously increased according to a set trend to accelerate the convergence speed; if the learning rate is moderate, it is kept constant.
[0102] In one embodiment of the present invention, a learning rate is set for each PID parameter, and the learning rate is adjusted and updated based on the above-described learning rate adjustment strategy.
[0103] Furthermore, in one embodiment of the present invention, the Bellman equation is expressed as:
[0104] Q(s,a)←Q(s,a)+α[r+γmax Q(s′,a′)-Q(s,a)]
[0105] Where Q(S,a) represents the Q value corresponding to the state S-action a pair, α represents the learning rate, r represents the immediate reward obtained after performing action a in state S, γ represents the discount factor, and max Q(s′,a′) represents the maximum Q value that can be obtained in the next state s′.
[0106] Furthermore, in one embodiment of the present invention, the preset threshold and the preset maximum number of iterations are specifically set according to the actual training requirements.
[0107] Furthermore, the control method of one embodiment of the present invention is used in the pressing control process of a 20-roll reversible rolling mill, and more specifically, in the pressing control process of a 20-roll reversible rolling mill producing stainless steel with a thickness of less than 0.015 mm, so as to ensure the pressing control efficiency of the rolling mill.
[0108] It should be noted that, in this document, relational terms such as "first" and "second" are used merely to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Additionally, the terms "front," "back," "left," "right," "upper," and "lower" in this document refer to the placement shown in the accompanying drawings.
[0109] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims
1. A rolling mill reduction control method based on Q-learning algorithm, characterized in that, include: Obtain the target and output values of the rolling force from the rolling mill reduction control system; Based on the target value and output value of the rolling force, the PID parameters of the PID controller are determined according to the three PID parameter Q value tables, which are obtained in advance using a preset Q learning algorithm. Based on the obtained target and output values of the rolling force, as well as the PID parameters, the output of the PID controller is obtained; The output of the PID controller is superimposed on the input of the mill reduction control system to adjust the output value of the rolling force of the mill reduction control system; The PID parameter Q value table is obtained using the following method: Step S201: Initialize the Q value table of the three PID parameters; Step S202, set the iteration variable eps=1; Step S203: Determine the rolling force state at the initial moment, and take the rolling force state at the initial moment as the current rolling force state. The rolling force state includes the target value and output value of the rolling force. Step S204: Calculate and determine the detection rate ε of the ε-greedy strategy using a preset decay control strategy; Step S205: Set the time step variable t=0; Step S206: Determine whether t is greater than or equal to a preset threshold. If yes, proceed to step S213. If no, increase the value of t by 1 and continue to the next step. Step S207: Select an action based on the current rolling force state using the ε-greedy strategy. The action includes three PID parameters. Step S208: Adjust the PID controller according to the selected action, and determine the output value of the rolling force after adjustment by the PID controller; Step S209: Determine the next rolling force state and calculate the instantaneous reward using a preset reward strategy; Step S210: Adaptively adjust the learning rates corresponding to the three PID parameters using a preset learning rate adjustment strategy. Step S211: Update the Q values corresponding to the three current rolling force state-PID parameter pairs using the Bellman equation; Step S212: Take the next rolling force state obtained as the current rolling force state and return to step S206; Step S213: Determine whether eps is greater than or equal to the preset maximum number of iterations. If yes, output the Q value table after updating the Q value. If no, increase the value of eps by 1 and return to step S203. The decay control strategy is defined as follows: ; in, This indicates a pre-set judgment threshold.
2. The mill reduction control method based on Q-learning algorithm according to claim 1, characterized in that, The reward strategy is defined as: ; in, Indicates an immediate reward. This represents the output value of the rolling force at step t+1. This represents the output value of the rolling force at step t. This represents the target value of a given rolling force. This represents the given rolling force threshold.
3. The rolling mill reduction control method based on Q-learning algorithm according to claim 1, characterized in that, The learning rate adjustment strategy is defined as: ; in, This represents the learning rate at step t+1. This represents the learning rate at step t. This represents the learning rate adjustment. Given a numerical value, , This represents a given numerical value corresponding to the discount factor. , This represents the time difference error at step t. This represents the time difference error at step t-1.
4. The rolling mill reduction control method based on Q-learning algorithm according to claim 1, characterized in that, The method further includes: The rolling force is discretized to divide the rolling force value into multiple intervals. The rolling force belonging to the same interval is regarded as the same state and is controlled by the same set of PID parameters.
5. The mill reduction control method based on Q-learning algorithm according to any one of claims 1-4, characterized in that, The method is used for the reduction control process of a 20-roll reversible rolling mill.
6. The rolling mill reduction control method based on Q-learning algorithm according to claim 5, characterized in that, The method is used for the reduction control process of producing stainless steel with a thickness of less than 0.015 mm in a 20-roll reversible rolling mill.