A machine learning based dynamic oxygen control system for converter steelmaking process
By using machine learning-based long-term modeling and constrained reinforcement learning decision-making, the instability and safety hazards of dynamic oxygen distribution control in converter steelmaking were solved, achieving high-precision and stable dynamic oxygen distribution control while taking into account endpoint hit rate and process stability.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- XUZHOU HUAHONG SPECIAL STEEL CO LTD
- Filing Date
- 2026-04-13
- Publication Date
- 2026-06-19
AI Technical Summary
Existing dynamic oxygen control methods in converter steelmaking processes are insufficient to fully characterize long-term evolutionary features and adapt to complex nonlinear reaction processes. Furthermore, they lack effective multi-step decision evaluation and real-time execution balancing, leading to unstable control strategies and safety hazards.
We employ machine learning-based long-term modeling and constrained reinforcement learning decision-making. Through the Informer network and the improved MuZero model, we combine multi-source operational data for unified collection and standardization. We introduce a long-term modeling mechanism to construct dynamic oxygenation reward evaluation parameters, perform multi-step look-ahead search and constraint tree search planning, and generate dynamic oxygenation control instructions.
It improves the accuracy and stability of dynamic oxygen distribution control, reduces the risk of fluctuations in the control process, ensures the hit rate of endpoint temperature and carbon content, and reduces the risk of splashing, thus achieving a balance between industrial real-time performance and safety.
Smart Images

Figure CN122018336B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of metallurgical automation control, and in particular to a dynamic oxygen distribution control system for converter steelmaking processes based on machine learning. Background Technology
[0002] Oxygen supply control in converter steelmaking is a critical factor affecting steel temperature, carbon content, slag composition, and production safety. Existing converter dynamic oxygen supply technologies mainly rely on empirical models, static control rules, or feedback adjustment methods based on a small number of process variables. They achieve endpoint control by pre-setting oxygen supply curves or modifying stage parameters. Some improvement schemes introduce data-driven models or machine learning methods to predict converter operating status and assist in oxygen supply decisions. However, most of these are still based on short-term observation data, which makes it difficult to fully characterize the long-term evolution of the converter blowing process and has limited adaptability to complex nonlinear reaction processes.
[0003] Furthermore, existing dynamic oxygen allocation methods based on intelligent algorithms often employ simple reward functions or post-constraint screening strategies, treating safety and process constraints as conditions for outcome correction. They fail to directly incorporate physical constraints such as oxygen supply intensity, cumulative amount, lance position changes, and splash risk into the decision space during the decision-making and planning stage, which can easily lead to unstable control strategies or safety hazards. Simultaneously, existing methods lack an effective balance between multi-step decision evaluation and real-time execution, making it difficult to fully utilize long-term time-series information for forward-looking optimization while ensuring industrial real-time performance. This restricts further improvements in the accuracy and operational safety of dynamic oxygen allocation control in converter steelmaking processes. Summary of the Invention
[0004] One objective of this invention is to propose a dynamic oxygen distribution control system for converter steelmaking based on machine learning. This invention combines long-term modeling with constrained reinforcement learning decision-making to achieve intelligent dynamic oxygen distribution control in converters, which has the advantages of high control accuracy, strong operational stability and excellent safety.
[0005] A machine learning-based dynamic oxygen distribution control system for a converter steelmaking process, according to an embodiment of the present invention, includes:
[0006] The data acquisition module is used to collect converter operating data during the converter blowing process;
[0007] The preprocessing module is used to preprocess the converter operation data, generate a standardized converter operation data set, and perform sliding segmentation to generate a long-time sequence operation fragment set of the converter.
[0008] The timing modeling module is used to construct a set of converter process state vectors based on a set of long-term operation segments of the converter, and input them into the Informer network for timing modeling to generate a long-term state representation vector of the converter.
[0009] The evaluation parameter construction module is used to construct a set of dynamic oxygen reward evaluation parameters;
[0010] The implicit state initialization module is used to build the improved MuZero model. It generates the planning root node state vector based on the converter long-time state representation vector and inputs it into the implicit state transition network to generate the initial hidden state.
[0011] The tree search planning module is used to construct a dynamic oxygen allocation action set. It takes the initial hidden state as the root node of the tree search, performs multi-step planning on the dynamic oxygen allocation action set, and generates a sequence of node statistics.
[0012] The instruction generation module is used to determine the target dynamic oxygenation action based on the node statistics sequence and generate a set of dynamic oxygenation control instructions.
[0013] The reward calculation module is used to issue a set of dynamic oxygenation control commands, execute adjustment operations, and calculate the corresponding reward value based on the execution result data according to the set of dynamic oxygenation reward evaluation parameters.
[0014] The control recording module is used to update the parameters of the improved MuZero model and repeat the process until the blowing end, generating a dynamic oxygen distribution control record set.
[0015] Optionally, the converter operating data includes oxygen supply flow rate, cumulative oxygen supply, oxygen lance position height, carbon monoxide volume fraction in flue gas at the furnace mouth, carbon dioxide volume fraction in flue gas at the furnace mouth, molten steel temperature, FeO mass fraction in slag, flame image characterization at the furnace mouth, and carbon content in molten steel.
[0016] Optionally, the preprocessing includes time synchronization, anomaly removal, missing data imputation, and normalization.
[0017] Optionally, the time series modeling module includes:
[0018] Based on the set of long-term operation segments of converters, time-series feature construction is performed on the oxygen supply flow rate, cumulative oxygen supply, oxygen lance position height, carbon monoxide volume fraction in flue gas at the furnace mouth, carbon dioxide volume fraction in flue gas at the furnace mouth, molten steel temperature, slag FeO mass fraction, furnace mouth flame image characterization and molten steel carbon content in each long-term operation segment of converters, and multiple corresponding feature vectors are generated.
[0019] Multiple feature vectors are concatenated to form the converter process state vector corresponding to the long-term operation segment of the converter, and the converter process state vectors constitute the converter process state vector set.
[0020] The converter process state vector set is input into the Informer network for time series modeling, generating a long-term time series state representation vector of the converter that characterizes the long-term evolution trend of the converter condition.
[0021] Optionally, the evaluation parameter construction module includes:
[0022] Based on the endpoint control requirements of converter steelmaking, a target temperature value for molten steel is set, and the molten steel temperature within the corresponding control cycle is obtained during the blowing process. The difference between the molten steel temperature and the target temperature value is calculated, and the difference is transformed by linear mapping to generate a molten steel temperature deviation penalty parameter.
[0023] Based on the converter's final carbon content control index, a target carbon content value for molten steel is set, and the carbon content of molten steel is obtained within the corresponding control cycle. The deviation between the carbon content of molten steel and the target carbon content value is calculated, and the deviation is transformed by linear mapping to generate a carbon content deviation penalty parameter for molten steel.
[0024] For the control range of carbon monoxide volume fraction in flue gas at the furnace mouth during converter blowing, a target range of carbon monoxide volume fraction in flue gas is set, and the volume fraction of carbon monoxide in flue gas at the furnace mouth in the corresponding time period is obtained. The degree of deviation from the target range is calculated, and the degree of deviation is linearly mapped to generate a deviation penalty parameter for carbon monoxide volume fraction in flue gas.
[0025] During the blowing process, the cumulative oxygen supply is statistically analyzed in real time. The current cumulative oxygen supply is compared with the preset cumulative oxygen supply control threshold. Based on the comparison result, a linear mapping is performed to generate a cumulative oxygen supply penalty parameter.
[0026] Based on the requirements for controlling the degree of slag oxidation, a target range for the FeO mass fraction in slag is set, and the FeO mass fraction in slag is collected within the corresponding control period. The degree of deviation between the FeO mass fraction in slag and the target range is calculated, and the degree of deviation is linearly mapped to the FeO deviation penalty parameter in slag.
[0027] Based on the requirements for splash risk control in the converter blowing process, the splash risk assessment value is calculated based on the characteristics of the change in the flame image representation, the characteristics of the change in the flue gas composition, and the characteristics of the change in the oxygen supply flow rate. Then, splash risk penalty parameters are generated based on the splash risk assessment value.
[0028] The parameters for penalizing deviations in molten steel temperature, carbon content, volume fraction of carbon monoxide in flue gas, cumulative oxygen supply, FeO deviation in slag, and splashing risk are combined to form a set of dynamic oxygen allocation reward evaluation parameters.
[0029] Optionally, the implicit state initialization module includes:
[0030] An improved MuZero model is constructed, which includes a representation network, an implicit state transition network, a value evaluation network, a policy output network, and a constrained tree search network.
[0031] In the representation network, the long-term state representation vector of the converter is obtained, a linear mapping is performed on the long-term state representation vector of the converter to generate a long-term evolution prior feature vector, and the long-term evolution prior feature vector is normalized to obtain the long-term evolution prior embedding vector.
[0032] Within the current control cycle, the current state feature vector is obtained from the converter standardized operation data set, and linear mapping and normalization are performed on the current state feature vector to generate the current state embedding vector.
[0033] Align the long-term evolution prior embedding vector with the current state embedding vector, and perform prior weighted modulation on each dimension of the current state embedding vector to generate a modulated state embedding vector constrained by the long-term evolution trend.
[0034] Perform nonlinear transformation and dimensionality compression on the modulation state embedding vector to generate the planning root node state vector;
[0035] The state vector of the planning root node is input into the implicit state transition network, and the implicit state initialization operation is performed to generate the initial implicit state, which is used as the starting implicit state for the dynamic oxygen allocation control planning process.
[0036] Optionally, the tree search planning module includes:
[0037] A dynamic oxygen distribution action set is constructed based on the process control requirements of the converter steelmaking process. The dynamic oxygen distribution action set includes a set of oxygen supply flow rate adjustment levels and a set of oxygen lance position adjustment levels, and is associated with oxygen supply intensity constraints, cumulative oxygen supply constraints, lance position adjustment range constraints, and operational safety constraints to form a dynamic oxygen distribution action set with constraints.
[0038] The initial hidden state is used as the root node hidden state of the constrained tree search network. The initial hidden state is input into the constrained tree search network to complete the initialization of the tree search structure, and the dynamic oxygenation action set is loaded into the corresponding search action space.
[0039] In the constrained tree search network, starting from the hidden state of the root node, a dynamic oxygenation action that satisfies the constraints is selected, and the dynamic oxygenation action is combined with the hidden state corresponding to the current search node. The combination is then input into the implicit state transition network to perform the hidden state recursion operation, forming a set of hidden states of child nodes.
[0040] Input the hidden state of each child node in the set of hidden states of the child nodes into the value evaluation network, perform the value evaluation operation, generate the corresponding node value estimate, and bind each node value estimate with the corresponding dynamic oxygenation action and the hidden state of the child node to generate the set of child node evaluation results.
[0041] Write the set of child node evaluation results into the tree search node structure, and update the node access count and cumulative node value corresponding to the root search node to form the updated node statistics;
[0042] From the set of hidden states of child nodes, determine the search node of the next search level based on the updated node statistics, and use the determined hidden state of the child node as the hidden state of the current search node of the next search level.
[0043] Repeat the process of generating dynamic oxygenation actions, generating hidden states, estimating node values, and updating node statistics until the preset number of search levels is reached, generating a multi-level dynamic oxygenation action search tree structure starting from the hidden state of the root node.
[0044] Perform summary processing on multiple search paths formed by the multi-layer dynamic oxygen allocation action search tree structure, and obtain the node statistics corresponding to the end search node of each search path.
[0045] The optimal search path is determined based on the node statistics corresponding to the search nodes at the end of each search path. The corresponding oxygen supply flow adjustment sequence and oxygen gun position adjustment sequence are extracted along the optimal search path to generate a multi-step dynamic oxygen distribution action sequence and a corresponding node statistics sequence.
[0046] Optionally, the instruction generation module includes:
[0047] Based on the node statistics sequence, determine the target dynamic oxygenation action corresponding to the current control cycle;
[0048] The target dynamic oxygenation action and the initial hidden state are input into the strategy output network. The strategy output network is then called to perform action mapping calculations to generate a set of dynamic oxygenation control instructions.
[0049] Optionally, the reward calculation module includes:
[0050] Obtain the set of dynamic oxygen supply control commands, and parse the target oxygen supply flow rate set value and the target oxygen gun position set value respectively to generate oxygen supply command parameter value and gun position command parameter value;
[0051] Write the oxygen supply command parameter value into the oxygen supply regulation interface of the converter oxygen supply control system, trigger the oxygen supply control system to generate an oxygen supply execution signal according to the oxygen supply command parameter value, and form an oxygen supply execution signal value;
[0052] Write the gun position command parameter value into the gun position control interface of the oxygen lance actuator to trigger the oxygen lance actuator to generate a gun position execution signal according to the gun position command parameter value, thus forming the gun position execution signal value;
[0053] Under the action of the oxygen supply execution signal value and the oxygen gun position execution signal value, the oxygen supply flow rate adjustment and oxygen gun position adjustment operations are performed, and the execution process is periodically timed to obtain the control cycle time value;
[0054] After the control cycle timer reaches the end of the control cycle, the execution result data is collected to form an execution result data set;
[0055] Generate a set of dynamic oxygenation reward evaluation parameters for the current control cycle based on the execution result data set;
[0056] The set of dynamic oxygenation reward evaluation parameters for the current control cycle is used as the reward calculation parameter combination;
[0057] The reward value for the current control period is generated based on the combination of reward calculation parameters.
[0058] Optionally, the control recording module includes:
[0059] Read the reward value corresponding to the current control cycle, and obtain the state representation information and dynamic oxygenation control behavior corresponding to the control cycle. Use the state representation information, dynamic oxygenation control behavior and reward value as training samples to improve the MuZero model.
[0060] Based on the training samples, parameter update operations are performed on the representation network, implicit state transition network, value evaluation network, and policy output network.
[0061] After completing the parameter update, the next control cycle begins, and the process of acquiring converter operation data, generating dynamic oxygen distribution control commands, calculating reward values, and updating model parameters is repeated until the converter blowing end point is reached.
[0062] During the cyclic execution, the set of dynamic oxygenation control instructions, reward values and execution result data corresponding to each control cycle are recorded in the order of the control cycle to form a dynamic oxygenation control record set.
[0063] The beneficial effects of this invention are:
[0064] This invention unifies and standardizes the collection of multi-source operating data during converter steelmaking and introduces a long-term time-series modeling mechanism based on Informer networks. This enables the long-term evolution trend of converter conditions across multiple control cycle scales, allowing dynamic oxygen distribution control to move beyond instantaneous observation information and instead make decisions and plans within a state space that fully considers historical evolution constraints. By using the long-term state representation vector as the initial state prior of the improved MuZero model and introducing a prior weighted modulation mechanism in the state representation stage, the invention effectively enhances the model's ability to perceive hysteresis, cumulative effects, and nonlinear reaction characteristics during converter blowing. This improves the adaptability and stability of the dynamic oxygen distribution control strategy to complex operating conditions, thereby reducing the risk of fluctuations in the control process while ensuring the hit rate of the endpoint temperature and carbon content.
[0065] Furthermore, this invention introduces a constrained tree search network in the decision-making and planning stage, directly embedding process and physical constraints such as oxygen supply intensity, cumulative oxygen supply, oxygen lance position adjustment range, and operational safety requirements into the dynamic oxygen distribution action search process. This ensures that control behaviors that do not meet the constraints are eliminated during the search stage, avoiding the safety hazards caused by relying on post-screening or simple penalty correction in existing technologies. Combined with a splash risk assessment and penalty mechanism constructed based on furnace flame image representation, flue gas composition change characteristics, and oxygen supply flow change characteristics, this invention can balance endpoint hit rate, process stability, and operational safety during the control process, achieving early suppression of splash risks. Through a decision-making method that combines multi-step look-ahead search with first-step action execution, it fully utilizes the multi-step planning and evaluation results while meeting industrial real-time control requirements, making dynamic oxygen distribution control more stable and reliable. Attached Figure Description
[0066] The accompanying drawings are provided to further illustrate the invention and form part of the specification. They are used in conjunction with embodiments of the invention to explain the invention and do not constitute a limitation thereof. In the drawings:
[0067] Figure 1 This is a schematic diagram of the structure of a machine learning-based dynamic oxygen distribution control system for converter steelmaking proposed in this invention.
[0068] Figure 2 This is a schematic diagram of the improved MuZero model for a machine learning-based dynamic oxygen distribution control system in a converter steelmaking process, as proposed in this invention. Detailed Implementation
[0069] The present invention will now be described in further detail with reference to the accompanying drawings. These drawings are simplified schematic diagrams, illustrating only the basic structure of the invention, and therefore only show the components relevant to the invention.
[0070] refer to Figure 1 and Figure 2 A machine learning-based dynamic oxygen distribution control system for converter steelmaking processes includes:
[0071] The data acquisition module is used to collect converter operating data during the converter blowing process;
[0072] The preprocessing module is used to preprocess the converter operation data, generate a standardized converter operation data set, and perform sliding segmentation to generate a long-time sequence operation fragment set of the converter.
[0073] The timing modeling module is used to construct a set of converter process state vectors based on a set of long-term operation segments of the converter, and input them into the Informer network for timing modeling to generate a long-term state representation vector of the converter.
[0074] The evaluation parameter construction module is used to construct a set of dynamic oxygen reward evaluation parameters;
[0075] The implicit state initialization module is used to build the improved MuZero model. It generates the planning root node state vector based on the converter long-time state representation vector and inputs it into the implicit state transition network to generate the initial hidden state.
[0076] The tree search planning module is used to construct a dynamic oxygen allocation action set. It takes the initial hidden state as the root node of the tree search, performs multi-step planning on the dynamic oxygen allocation action set, and generates a sequence of node statistics.
[0077] The instruction generation module is used to determine the target dynamic oxygenation action based on the node statistics sequence and generate a set of dynamic oxygenation control instructions.
[0078] The reward calculation module is used to issue a set of dynamic oxygenation control commands, execute adjustment operations, and calculate the corresponding reward value based on the execution result data according to the set of dynamic oxygenation reward evaluation parameters.
[0079] The control recording module is used to update the parameters of the improved MuZero model and repeat the process until the blowing end, generating a dynamic oxygen distribution control record set.
[0080] In this embodiment, the converter operation data includes oxygen supply flow rate, cumulative oxygen supply, oxygen lance height, carbon monoxide volume fraction in flue gas at the furnace mouth, carbon dioxide volume fraction in flue gas at the furnace mouth, molten steel temperature, FeO mass fraction in slag, flame image representation at the furnace mouth, and carbon content in molten steel. Oxygen supply flow rate is a real-time measurement of the volume of oxygen input into the converter per unit time during the converter blowing process, used to characterize the oxygen input intensity of the current blowing stage. Cumulative oxygen supply is the total amount of oxygen cumulatively input into the converter from the start of blowing to the current moment, used to characterize the overall oxygen supply level during the blowing process. Oxygen lance height is the vertical distance between the oxygen lance nozzle and the molten steel surface or a converter reference position, used to characterize the spatial position of the oxygen lance and its adjustment status over time. The carbon monoxide volume fraction in flue gas at the furnace mouth is the proportion of carbon monoxide in the flue gas discharged from the converter furnace mouth, used to characterize the intensity of the carbon-oxygen reaction and the reduction reaction process within the furnace. The following parameters are used to characterize the furnace: **Fuel temperature:** **Fuel temperature:** The volume fraction of carbon dioxide in the flue gas discharged from the converter furnace mouth is the percentage of carbon dioxide in the flue gas, used to characterize the degree of oxidation reaction and the characteristics of the reaction stage within the furnace. **Steel temperature:** The instantaneous temperature measurement of molten steel during the blowing process in the converter, used to characterize the thermal state of the molten steel and the final temperature control level. **Slag FeO mass fraction:** The mass proportion of ferrous oxide in the converter slag, used to characterize the oxidizing properties of the slag and the iron loss state. **Fuel mouth flame image characterization:** The flame image characterization quantity is formed by collecting flame image data at the converter furnace mouth using industrial cameras deployed in the furnace mouth area, performing joint extraction of brightness, color, shape contour, and temporal fluctuation features on the collected flame image data, and numerically encoding the extracted features. This quantity characterizes the reaction intensity, splashing state, and blowing stage characteristics within the furnace. **Steel carbon content:** The mass fraction of carbon element in the molten steel during the converter blowing process, used to characterize the decarburization process and the final carbon content control state.
[0081] In this embodiment, preprocessing includes time synchronization, anomaly removal, missing data imputation, and normalization.
[0082] In this embodiment, the time series modeling module includes:
[0083] Based on the set of long-term operation segments of converters, time-series feature construction is performed on the oxygen supply flow rate, cumulative oxygen supply, oxygen lance position height, carbon monoxide volume fraction in flue gas at the furnace mouth, carbon dioxide volume fraction in flue gas at the furnace mouth, molten steel temperature, slag FeO mass fraction, furnace mouth flame image characterization and molten steel carbon content in each long-term operation segment of converters, and multiple corresponding feature vectors are generated.
[0084] Multiple feature vectors are concatenated to form the converter process state vector corresponding to the long-term operation segment of the converter, and the converter process state vectors constitute the converter process state vector set.
[0085] The set of converter process state vectors is input into the Informer network for time series modeling, generating a long-term time series state representation vector of the converter that characterizes the long-term evolution trend of the converter condition. This vector serves as the initial state prior used for implicit state transition and tree search in the MuZero model.
[0086] In this embodiment, the evaluation parameter construction module includes:
[0087] Based on the endpoint control requirements of converter steelmaking, a target temperature value for molten steel is set, and the molten steel temperature within the corresponding control cycle is obtained during the blowing process. The difference between the molten steel temperature and the target temperature value is calculated, and the difference is transformed by linear mapping to generate a molten steel temperature deviation penalty parameter.
[0088] Based on the converter's final carbon content control index, a target carbon content value for molten steel is set, and the carbon content of molten steel is obtained within the corresponding control cycle. The deviation between the carbon content of molten steel and the target carbon content value is calculated, and the deviation is transformed by linear mapping to generate a carbon content deviation penalty parameter for molten steel.
[0089] For the control range of carbon monoxide volume fraction in flue gas at the furnace mouth during converter blowing, a target range of carbon monoxide volume fraction in flue gas is set, and the volume fraction of carbon monoxide in flue gas at the furnace mouth in the corresponding time period is obtained. The degree of deviation from the target range is calculated, and the degree of deviation is linearly mapped to generate a deviation penalty parameter for carbon monoxide volume fraction in flue gas.
[0090] During the blowing process, the cumulative oxygen supply is statistically analyzed in real time. The current cumulative oxygen supply is compared with the preset cumulative oxygen supply control threshold. Based on the comparison result, a linear mapping is performed to generate a cumulative oxygen supply penalty parameter.
[0091] Based on the requirements for controlling the degree of slag oxidation, a target range for the FeO mass fraction in slag is set, and the FeO mass fraction in slag is collected within the corresponding control period. The degree of deviation between the FeO mass fraction in slag and the target range is calculated, and the degree of deviation is linearly mapped to the FeO deviation penalty parameter in slag.
[0092] Based on the requirements for splash risk control in the converter blowing process, the splash risk assessment value is calculated based on the characteristics of the change in the flame image representation, the characteristics of the change in the flue gas composition, and the characteristics of the change in the oxygen supply flow rate. Then, splash risk penalty parameters are generated based on the splash risk assessment value.
[0093] When generating splash risk penalty parameters, considering the splash risk control requirements during the converter blowing process, the furnace mouth flame image representation quantity is acquired within the corresponding control cycle, and the furnace mouth flame image representation quantity corresponding to the previous control cycle is also acquired. A difference operation is performed on the furnace mouth flame image representation quantities of the current control cycle and the previous control cycle to obtain the flame change difference vector. Amplitude aggregation processing is then performed on the flame change difference vector to generate the flame change value. Simultaneously, the volume fraction of carbon monoxide and carbon dioxide in the furnace mouth flue gas is acquired within the corresponding control cycle. The changes in the volume fraction of carbon monoxide and carbon dioxide in the furnace mouth flue gas between the current control cycle and the previous control cycle are calculated respectively, and based on these two values… The process involves: generating flue gas component change values; acquiring oxygen supply flow rates within the same control cycle and calculating the difference between the current control cycle's oxygen supply flow rate and the previous control cycle's oxygen supply flow rate to generate oxygen supply flow rate change values; normalizing the flame change values, flue gas component change values, and oxygen supply flow rate change values to generate flame change intensity values, flue gas component change intensity values, and oxygen supply flow rate change intensity values; performing a weighted summation of the flame change intensity values, flue gas component change intensity values, and oxygen supply flow rate change intensity values to obtain a splash risk assessment value; comparing the splash risk assessment value with a preset splash risk threshold to generate a splash risk judgment value; and performing a linear mapping transformation based on the splash risk judgment value to generate splash risk penalty parameters.
[0094] The steel temperature deviation penalty parameter characterizes the degree of deviation of the steel temperature from the target steel temperature value; the steel carbon content deviation penalty parameter characterizes the degree of deviation of the steel carbon content from the target steel carbon content value; the flue gas carbon monoxide volume fraction deviation penalty parameter characterizes the degree of deviation of the furnace mouth flue gas carbon monoxide volume fraction from the preset control range; the cumulative oxygen supply penalty parameter characterizes the degree of exceedance of the current cumulative oxygen supply from the preset cumulative oxygen supply control threshold; the slag FeO deviation penalty parameter characterizes the degree of deviation of the slag FeO mass fraction from the preset slag FeO target range; and the splash risk penalty parameter characterizes the splash risk level calculated based on the furnace mouth flame image representation, furnace mouth flue gas composition change characteristics, and oxygen supply flow rate change characteristics. All penalty parameters together constitute the dynamic oxygen allocation reward evaluation parameter set, providing a comprehensive evaluation of the dynamic oxygen allocation control behavior in terms of endpoint hit rate, process stability, and operational safety.
[0095] The parameters for penalizing deviations in molten steel temperature, carbon content, volume fraction of carbon monoxide in flue gas, cumulative oxygen supply, FeO deviation in slag, and splashing risk are combined to form a set of dynamic oxygen distribution reward evaluation parameters, which are used for a comprehensive evaluation of the control behavior during the dynamic oxygen distribution control process.
[0096] In this embodiment, the implicit state initialization module includes:
[0097] An improved MuZero model is constructed, which includes a representation network, an implicit state transition network, a value evaluation network, a policy output network, and a constrained tree search network.
[0098] The improvements to the MuZero model are as follows:
[0099] A priori-guided state representation mechanism based on long-term converter operation segments is introduced: When constructing the MuZero initial state, the current observed state is not directly used as the input of the representation network. Instead, based on the set of long-term converter operation segments, a long-term converter state representation vector representing the long-term evolution trend of the converter condition is generated through the Informer network. This long-term converter state representation vector is used as the long-term evolution prior of the state representation stage to guide the subsequent implicit state transition and tree search process, so that the planned state simultaneously includes the current furnace condition information and the historical evolution constraint information under multiple control cycle scales.
[0100] A priori weighted modulation mechanism for constructing long-term evolution prior embedding vectors and current state embedding vectors is established: In the representation network, embedding mapping is performed on the converter long-term state representation vector and the current control cycle state feature vector respectively to obtain the long-term evolution prior embedding vector and the current state embedding vector. Based on the feature dimension alignment results of the two, a priori weight vector is generated. Priori weighted modulation is performed on the current state embedding vector dimension by dimension to form a modulated state embedding vector constrained by the long-term evolution trend of converter conditions. This avoids the problem in the standard MuZero where the planning state is constructed based only on instantaneous observations and ignores the continuity of process evolution.
[0101] In the planning root node state construction stage, a joint process of nonlinear transformation, dimension compression and stabilization is introduced: when generating the planning root node state vector, the prior modulated state embedding vector is sequentially subjected to nonlinear mapping, feature normalization, target dimension compression and numerical range stabilization to form a planning root node state vector with controlled numerical distribution and uniform scale, which is used as the input of the implicit state transition network to solve the problem of state scale drift and search instability that the standard MuZero is prone to in high-dimensional continuous industrial states.
[0102] Constructing a constrained tree search network for dynamic oxygen supply constraints: In the MuZero tree search stage, a constrained tree search network is introduced. During the search process with the initial hidden state as the root node, process constraints are applied to the set of dynamic oxygen supply actions. Only dynamic oxygen supply actions that meet the requirements of oxygen supply intensity, cumulative oxygen supply, lance position adjustment range and operational safety are allowed to participate in the multi-step look-ahead search. This allows the physical and safety constraints in the converter steelmaking process to be directly embedded into the tree search planning process, rather than as ex-post screening conditions.
[0103] The representation network is used to perform a priori guided state representation modeling of converter operating status information; the implicit state transition network is used to generate the initial implicit state of planning; the value evaluation network is used to evaluate the value of the implicit state generated during the planning process; and the constrained tree search network is used to perform multi-step look-ahead search planning under the constraints of dynamic oxygen allocation action.
[0104] In the representation network, the long-term state representation vector of the converter is obtained, a linear mapping is performed on the long-term state representation vector of the converter to generate a long-term evolution prior feature vector, and the long-term evolution prior feature vector is normalized to obtain a long-term evolution prior embedding vector, which is used to characterize the overall evolution trend of the converter condition on multiple control cycle scales.
[0105] This invention introduces the converter long-term state representation vector generated by the Informer network into the implicit state transition and tree search process of the improved MuZero model as the initial state prior for predictive planning, so that the state transition and tree search can be carried out in a state space with long-term evolution information, thereby breaking through the technical limitations of existing reinforcement learning methods that rely on short-term state information and are difficult to cope with the lag characteristics of industrial processes.
[0106] Within the current control cycle, the current state feature vector is obtained from the standardized operating data set of the converter. The current state feature vector is then linearly mapped and normalized to generate a current state embedding vector, which is used to characterize the real-time operating state of the converter within the current control cycle.
[0107] Align the long-term evolution prior embedding vector with the current state embedding vector, and perform prior weighted modulation on each dimension of the current state embedding vector to generate a modulated state embedding vector constrained by the long-term evolution trend.
[0108] During prior weighted modulation, the long-term evolution prior embedding vector and the current state embedding vector are obtained, and their vector dimension information is read to generate prior dimension values and state dimension values. Based on the prior dimension values and state dimension values, the target alignment dimension value is determined, and dimension mapping is performed on the long-term evolution prior embedding vector according to the target alignment dimension value to generate an aligned prior vector. At the same time, dimension mapping is performed on the current state embedding vector to generate an aligned state vector. Numerical pruning is performed on each component of the aligned prior vector to generate a prior weight vector. Normalization is performed on each component of the prior weight vector to generate a normalized prior weight vector. According to the one-to-one correspondence of feature dimensions, the weight component values of the normalized prior weight vector and the state component values of the aligned state vector are read dimension by dimension. Weighted modulation operation is performed on the state component values to generate modulation component values. The modulation component values of each dimension are combined in the original dimensional order to obtain the modulation state embedding vector constrained by the long-term evolution trend.
[0109] Perform nonlinear transformation and dimensionality compression on the modulation state embedding vector to generate the planning root node state vector;
[0110] When generating the planning root node state vector, the modulation state embedding vector is obtained, and the original dimension value of the modulation state embedding vector is read. Nonlinear mapping processing is performed on the modulation state embedding vector dimension by dimension to generate a nonlinear feature response vector. Numerical normalization is performed on each dimension component of the nonlinear feature response vector to generate a normalized feature vector. Based on the preset target state dimension value, dimension compression processing is performed on the normalized feature vector to obtain a compressed feature vector. Stabilization processing is performed on the compressed feature vector to generate the planning root node state vector. The planning root node state vector contains the current furnace condition feature information and the long-term evolution prior information of the converter furnace condition. The stabilization processing performs numerical pruning and normalization processing on the dimension-compressed feature vector to suppress extreme component values and generate a planning root node state vector with a controlled numerical range.
[0111] The state vector of the planning root node is input into the implicit state transition network, and the implicit state initialization operation is performed to generate the initial implicit state, which is used as the starting implicit state of the dynamic oxygen allocation control planning process.
[0112] When generating the initial hidden state, the state vector of the planned root node is obtained, and the vector dimension value of the planned root node state vector is read. The planned root node state vector is input into the implicit state transition network, which adopts a multi-layer feedforward neural network structure, including an input mapping layer, a nonlinear feature transformation layer, and a hidden state generation layer. In the input mapping layer, a linear mapping process is performed on the planned root node state vector to generate a first intermediate feature vector. The first intermediate feature vector is input into the nonlinear feature transformation layer, and a nonlinear mapping process is performed on its components to generate a second intermediate feature vector. The second intermediate feature vector is normalized to generate a normalized feature vector. The normalized feature vector is input into the hidden state generation layer, and a dimension mapping process is performed to generate a hidden state candidate vector. The hidden state candidate vector is subjected to numerical range constraint processing to generate the initial hidden state as the input of the root node hidden state of the constrained tree search network.
[0113] In this embodiment, the tree search planning module includes:
[0114] Based on the process control requirements of converter steelmaking, a dynamic oxygen distribution action set is constructed. The dynamic oxygen distribution action set includes an oxygen supply flow rate adjustment set and an oxygen lance position adjustment set. The oxygen supply flow rate adjustment set consists of multiple discrete oxygen supply flow rate set values, and the oxygen lance position adjustment set consists of multiple discrete oxygen lance position height set values. It is also associated with oxygen supply intensity constraints, cumulative oxygen supply constraints, lance position adjustment range constraints, and operational safety constraints to form a dynamic oxygen distribution action set with constraints.
[0115] Oxygen supply intensity constraint is used to limit the range of oxygen supply flow setting values in the oxygen supply flow adjustment gear set and the variation amplitude between adjacent control cycles, to prevent overload or sudden changes in oxygen supply intensity during dynamic oxygen distribution control. Oxygen supply cumulative amount constraint is used to limit the selectable oxygen supply flow adjustment gears during dynamic oxygen distribution control, so that the cumulative oxygen supply after the execution of the corresponding control cycle does not exceed the preset cumulative oxygen supply amount control threshold. Nozzle position adjustment range constraint is used to limit the absolute height range of each oxygen gun position height setting value in the oxygen gun position adjustment gear set and the variation amplitude of the gun position between adjacent control cycles, to ensure that the spatial position of the oxygen gun is within the allowable adjustment range. Operational safety constraint is used to filter the combination of the oxygen supply flow adjustment gear set and the oxygen gun position adjustment gear set to prevent dynamic oxygen distribution actions from triggering splash risk or safety operation thresholds.
[0116] The initial hidden state is used as the root node hidden state of the constrained tree search network. The initial hidden state is input into the constrained tree search network to complete the initialization of the tree search structure, and the dynamic oxygenation action set is loaded into the corresponding search action space.
[0117] In the constrained tree search network, starting from the hidden state of the root node, a dynamic oxygenation action that satisfies the constraints is selected, and the dynamic oxygenation action is combined with the hidden state corresponding to the current search node. The combination is then input into the implicit state transition network to perform the hidden state recursion operation, forming a set of hidden states of child nodes.
[0118] In a constrained tree search network, starting from the hidden state of the root node, the tree search for generating dynamic oxygen distribution actions is performed as follows: The hidden state of the root node is input into the constrained tree search network to generate the root search node. A node identifier value is assigned to the root search node, and the corresponding hidden state value is recorded as the current hidden state of the search node. Based on the current hidden state of the search node, the sets of oxygen supply flow adjustment levels and oxygen gun position adjustment levels are read respectively. A Cartesian product combination operation is performed on the sets of oxygen supply flow adjustment levels and oxygen gun position adjustment levels to generate a set of dynamic oxygen distribution actions. Each element in the set of dynamic oxygen distribution actions corresponds to a combination of an oxygen supply flow adjustment level value and an oxygen gun position adjustment level value. For each dynamic oxygenation action in the dynamic oxygenation action set, a corresponding hidden state-action input vector is constructed. This hidden state-action input vector is then input into the implicit state transition network to perform hidden state recursion operations, generating the next layer of hidden state values. After generating the next layer of hidden state values, a new search node identifier value is assigned to each next layer of hidden state. The hierarchical relationship between the new node identifier value and the root search node identifier value is recorded in the tree search node structure, forming the child node hidden state set corresponding to the root search node. The child node hidden state set is written into the tree search node structure, enabling the tree search network to form a corresponding search extension branch under the root search node, which is used for child node value evaluation and node statistics update.
[0119] When performing the hidden state recursive operation, for the selected dynamic oxygenation action in the set of dynamic oxygenation actions, the corresponding oxygen supply flow rate adjustment level value and oxygen gun position adjustment level value are read; the oxygen supply flow rate adjustment level value and oxygen gun position adjustment level value are mapped into action numerical representation vectors; the action numerical representation vectors are normalized to generate action embedding vectors; the action embedding vectors and the hidden state of the current search node are input into the hidden state transition network, and the hidden state recursive operation is performed to generate the hidden state at the next time step, thereby obtaining the set of hidden states of child nodes;
[0120] Each hidden state of a child node in the set of hidden states of the child nodes is input into the value evaluation network, the value evaluation operation is performed, the corresponding node value estimate is generated, and each node value estimate is bound to the corresponding dynamic oxygenation action and the hidden state of the child node to generate a set of child node evaluation results. The value evaluation network uses a multilayer perceptron.
[0121] Write the set of child node evaluation results into the tree search node structure, and update the node access count and cumulative node value corresponding to the root search node to form the updated node statistics;
[0122] Node access count is the cumulative number of times a corresponding search node is selected and performs search expansion operations during the tree search expansion process in the constrained tree search network. It is used to characterize the access frequency of the search node in the current search process. Node cumulative value is the cumulative value formed by aggregating and updating the node value estimates of all its generated child nodes during the tree search expansion process in the constrained tree search network. It is used to characterize the overall value level of the dynamic oxygenation action path corresponding to the search node.
[0123] From the set of hidden states of child nodes, determine the search node of the next search level based on the updated node statistics, and use the determined hidden state of the child node as the hidden state of the current search node of the next search level.
[0124] Repeat the process of generating dynamic oxygenation actions, generating hidden states, estimating node values, and updating node statistics until the preset number of search levels is reached, generating a multi-level dynamic oxygenation action search tree structure starting from the hidden state of the root node.
[0125] Perform summary processing on multiple search paths formed by the multi-layer dynamic oxygen allocation action search tree structure, and obtain the node statistics corresponding to the end search node of each search path.
[0126] The optimal search path is determined based on the node statistics corresponding to the search nodes at the end of each search path. The corresponding oxygen supply flow adjustment sequence and oxygen gun position adjustment sequence are extracted along the optimal search path to generate a multi-step dynamic oxygen distribution action sequence and a corresponding node statistics sequence.
[0127] In this embodiment, the instruction generation module includes:
[0128] Based on the node statistics sequence, determine the target dynamic oxygenation action corresponding to the current control cycle;
[0129] After the constrained tree search network completes the multi-level dynamic oxygenation action search, based on the node statistics back-transmission results formed during the multi-level search process, the first-step action selection process is performed on the first-order candidate child nodes corresponding to the root search node to generate the target dynamic oxygenation action corresponding to the current control cycle. Among them, the node statistics back-transmission results are formed by back-transmitting and aggregating the value evaluation results of the end nodes of each search path layer by layer along the search path during the multi-level dynamic oxygenation action search process, so that the node access count and cumulative node value of each candidate child node adjacent to the root search node include the comprehensive evaluation information of the subsequent multi-level search paths.
[0130] In the initial action selection process, the node access count and cumulative value of each candidate child node adjacent to the root search node are read from the node statistics sequence, and a candidate action statistical alignment set is generated based on the node access count and cumulative value. For each candidate child node in the candidate action statistical alignment set, the action score is calculated based on the normalized weighted aggregation result of the cumulative value and the node access count. The maximum value is determined for the action score of each candidate child node, and the candidate child node with the largest action score is selected as the target child node. The oxygen supply flow rate adjustment level and oxygen gun position adjustment level corresponding to the target child node are read as the target dynamic oxygen distribution action for the current control cycle.
[0131] The node statistics feedback result is the mechanism result of the formation and updating of node statistics in the multi-level search process. The node statistics sequence is the ordered storage and reading form of the feedback result in the search node structure. The node access count and node cumulative value contained in the node statistics sequence are both derived from the node statistics feedback result.
[0132] In a constrained tree search network, the purpose of multi-level dynamic oxygenation action search is to prospectively evaluate the evolution results of different initial dynamic oxygenation actions in subsequent control cycles, rather than generating multi-step control sequences that need to be directly executed. During the multi-level search process, the value evaluation results of the terminal nodes of each search path are transmitted back layer by layer along the search path, and the node access counts and cumulative node values corresponding to each level of the path are updated. This ensures that the node statistics corresponding to the first-order candidate child nodes adjacent to the root search node have comprehensively reflected the overall effect of using the first-order action as the initial step in the subsequent multi-level dynamic oxygenation control evolution. In the current control cycle execution decision stage, the initial action only needs to be selected based on the node access counts and cumulative node values corresponding to the first-order candidate child nodes adjacent to the root search node. This ensures the utilization of the multi-level search evaluation results while avoiding the direct use of subsequent control behaviors that have not yet occurred for the current control cycle execution, thereby meeting the real-time and safety requirements of dynamic control of industrial processes.
[0133] The target dynamic oxygenation action and the initial hidden state are input into the strategy output network. The strategy output network is called to perform action mapping calculation to generate a set of dynamic oxygenation control instructions. The strategy output network uses a multi-layer feedforward neural network. The set of dynamic oxygenation control instructions includes the target oxygen supply flow rate set value and the target oxygen gun position set value.
[0134] The system obtains the hidden state vector corresponding to the initial hidden state; it obtains the oxygen supply flow rate adjustment level value and the oxygen gun position adjustment level value corresponding to the target dynamic oxygenation action, and combines the oxygen supply flow rate adjustment level value and the oxygen gun position adjustment level value to form an action vector; it performs numerical scale alignment processing on the action vector to generate an action embedding vector value; it concatenates the action embedding vector value and the hidden state vector value to form a joint input vector value; and it inputs the joint input vector value into the strategy output network as input data for the strategy output network to generate the corresponding dynamic oxygenation control command set.
[0135] In this embodiment, the reward calculation module includes:
[0136] Obtain the set of dynamic oxygen supply control commands, and parse the target oxygen supply flow rate set value and the target oxygen gun position set value respectively to generate oxygen supply command parameter value and gun position command parameter value;
[0137] Write the oxygen supply command parameter value into the oxygen supply regulation interface of the converter oxygen supply control system, trigger the oxygen supply control system to generate an oxygen supply execution signal according to the oxygen supply command parameter value, and form an oxygen supply execution signal value;
[0138] Write the gun position command parameter value into the gun position control interface of the oxygen lance actuator to trigger the oxygen lance actuator to generate a gun position execution signal according to the gun position command parameter value, thus forming the gun position execution signal value;
[0139] Under the action of the oxygen supply execution signal value and the oxygen gun position execution signal value, the oxygen supply flow rate adjustment and oxygen gun position adjustment operations are performed, and the execution process is periodically timed to obtain the control cycle time value;
[0140] After the control cycle timer reaches the end of the control cycle, the execution result data is collected. The execution result data includes oxygen supply flow rate, cumulative oxygen supply, oxygen lance position height, carbon monoxide volume fraction in furnace flue gas, carbon dioxide volume fraction in furnace flue gas, molten steel temperature, slag FeO mass fraction, furnace flame image characterization, and molten steel carbon content, forming an execution result data set.
[0141] Generate a set of dynamic oxygenation reward evaluation parameters for the current control cycle based on the execution result data set;
[0142] The set of dynamic oxygenation reward evaluation parameters for the current control cycle is used as the reward calculation parameter combination;
[0143] The reward value for the current control period is generated based on the combination of reward calculation parameters;
[0144] When generating a reward value based on the reward calculation parameter combination, the reward calculation parameter combination corresponding to the current control cycle is read. The reward calculation parameter combination consists of steel temperature deviation penalty parameter, steel carbon content deviation penalty parameter, flue gas carbon monoxide volume fraction deviation penalty parameter, cumulative oxygen supply penalty parameter, slag FeO deviation penalty parameter, and splash risk penalty parameter. The penalty parameters in the reward calculation parameter combination are sequentially valued to form a penalty parameter value sequence. Then, an aggregation operation is performed on the penalty parameter value sequence to generate the current control cycle reward value in a single numerical form.
[0145] In this embodiment, the control recording module includes:
[0146] Read the reward value corresponding to the current control cycle, and obtain the state representation information and dynamic oxygenation control behavior corresponding to the control cycle. Use the state representation information, dynamic oxygenation control behavior and reward value as training samples to improve the MuZero model.
[0147] The state representation information is the converter long-time state representation vector, which is the state vector representation of the converter operating state in the current control cycle. The dynamic oxygen distribution control behavior is the action vector representation of the dynamic oxygen distribution action actually executed in the current control cycle.
[0148] Based on the training samples, parameter update operations are performed on the representation network, implicit state transition network, value evaluation network, and policy output network.
[0149] A backpropagation training method based on mini-batch random sampling is adopted. An adaptive gradient optimization method is used to jointly update the representation network, implicit state transition network, value evaluation network, and policy output network. Based on the node visit counts formed by the constrained tree search network at the candidate child nodes adjacent to the root search node, normalization processing is performed to obtain the target policy distribution, and a reward value sequence is formed based on the reward value. The reward value sequence is discounted and accumulated to generate the value target reward. The cross-entropy loss between the target policy distribution and the output distribution of the policy output network and the mean squared error loss between the output of the value evaluation network and the value target reward are calculated separately. The two types of losses are weighted and summed to form the total loss. Gradient backpropagation is performed on the total loss to update the network parameters.
[0150] After completing the parameter update, the next control cycle begins, and the process of acquiring converter operation data, generating dynamic oxygen distribution control commands, calculating reward values, and updating model parameters is repeated until the converter blowing end point is reached.
[0151] During the cyclic execution, the set of dynamic oxygenation control instructions, reward values and execution result data corresponding to each control cycle are recorded in the order of the control cycle to form a dynamic oxygenation control record set.
[0152] Example 1: To verify the feasibility of this invention in practice, it was applied to a large-scale converter steelmaking production line in East China. During the actual blowing process, the oxygen supply flow rate and oxygen lance height were intelligently adjusted online. The original control method of this production line mainly relied on experience curves and stage corrections. In the later stage of blowing, problems such as large fluctuations in molten steel temperature, unstable hit rate of carbon content at the endpoint, and difficulty in predicting the risk of furnace mouth splashing often occurred. Especially when the furnace condition changed stage, the operators often relied on experience to make rapid adjustments, which could easily cause sudden changes in oxygen supply intensity or cumulative oxygen supply deviation, increasing the probability of back blowing and supplementary blowing, affecting the rhythm and safety.
[0153] In actual operation, converter operation data is connected to the acquisition module of this invention in real time. A unified preprocessing and sliding segmentation are performed on oxygen supply flow rate, cumulative oxygen supply, oxygen lance position height, flue gas composition at the furnace mouth, molten steel temperature, slag composition, and furnace mouth flame image representation. A long-term state representation vector is constructed through an Informer network, enabling the system to identify the evolution trend of furnace conditions over multiple control cycles. Based on this, the MuZero model is improved to use the long-term state representation vector as the initial state prior, and the current state is weighted and modulated during the representation stage. This ensures that dynamic oxygen allocation decisions consider both immediate furnace conditions and historical evolution constraints. The constrained tree search network embeds oxygen supply intensity limits, cumulative quantity constraints, and lance position change amplitude constraints during the action generation stage, preventing adjustment schemes that do not meet process conditions from entering the search branch, thus reducing the risk of abnormal operation from the source. Combined with a splash risk assessment mechanism, the system automatically adjusts the oxygen supply rhythm when it detects abnormal flame changes and flue gas composition fluctuation trends, avoiding splashing caused by operational lag.
[0154] To verify the performance of the invention in practice, comparative experiments were conducted.
[0155] Table 1. Comparison of Dynamic Oxygen Distribution Control Effects in Converter Steelmaking (Statistical Table)
[0156]
[0157] As shown in Table 1, in terms of endpoint control accuracy, the endpoint temperature hit rate of the system of this invention has increased from 89.4% to 94.8%, and the endpoint carbon content hit rate has increased from 87.9% to 92.6%. Both core quality indicators have improved by more than 4 percentage points. This improvement indicates that the system, through long-term state modeling and prior weighted modulation mechanism, enables dynamic oxygen allocation decisions to no longer rely on instantaneous states, but can integrate the furnace condition evolution trends of multiple control cycles, thereby achieving more stable hit control at the endpoint stage. Traditional methods are prone to judgment lag when furnace condition transitions occur, while this invention constructs long-term evolution priors through Informer, enabling the model to identify temperature and decarbonization trend changes in advance, thereby reducing endpoint deviation.
[0158] Regarding process stability, the average number of replenishment blows decreased from 0.62 to 0.28, a reduction of more than half. Simultaneously, the standard deviation of the blowing cycle fluctuation decreased from 1.85 minutes to 0.92 minutes, indicating a more uniform and stable control rhythm. The reduction in replenishment blows signifies improved accuracy in endpoint judgment and a lower probability of rework. The convergence of cycle fluctuations reflects enhanced continuity in the oxygen supply regulation process. This is primarily due to the constrained tree search network embedding constraints on oxygen supply intensity, cumulative amount, and nozzle position changes during the decision-making stage. This allows the control strategy to eliminate high-risk and abrupt actions during the generation phase, avoiding rhythmic oscillations caused by experience-based adjustments in traditional methods.
[0159] Regarding operational safety and resource utilization, the splash alarm incidence rate decreased from 6.5% to 2.1%, and the oxygen consumption deviation rate decreased from 4.8% to 2.3%. The significant reduction in splash alarms indicates that the splash risk assessment mechanism, through the fusion analysis of flame image changes, flue gas composition fluctuations, and oxygen supply flow rate changes, has achieved early identification and suppression of abnormal reaction trends. The reduction in the oxygen consumption deviation rate demonstrates that the system has achieved more precise oxygen supply control while ensuring the final quality, thus reducing energy waste caused by excessive oxygen supply.
[0160] The above are merely preferred embodiments of the present invention, but the scope of protection of the present invention is not limited thereto. Any equivalent substitutions or modifications made by those skilled in the art within the scope of the technology disclosed in the present invention, based on the technical solution and inventive concept of the present invention, should be covered within the scope of protection of the present invention.
Claims
1. A machine learning based dynamic oxygen lancing control system for a converter steelmaking process, characterized in that, include: The data acquisition module is used to collect converter operating data during the converter blowing process; The preprocessing module is used to preprocess the converter operation data, generate a standardized converter operation data set, and perform sliding segmentation to generate a long-time sequence operation fragment set of the converter. The timing modeling module is used to construct a set of converter process state vectors based on a set of long-term operation segments of the converter, and input them into the Informer network for timing modeling to generate a long-term state representation vector of the converter. The evaluation parameter construction module is used to construct a set of dynamic oxygen reward evaluation parameters; The implicit state initialization module is used to build the improved MuZero model. It generates the planning root node state vector based on the converter long-time state representation vector and inputs it into the implicit state transition network to generate the initial hidden state. The tree search planning module is used to construct a dynamic oxygen allocation action set. It takes the initial hidden state as the root node of the tree search, performs multi-step planning on the dynamic oxygen allocation action set, and generates a sequence of node statistics. The instruction generation module is used to determine the target dynamic oxygenation action based on the node statistics sequence and generate a set of dynamic oxygenation control instructions. The reward calculation module is used to issue a set of dynamic oxygenation control commands, execute adjustment operations, and calculate the corresponding reward value based on the execution result data according to the set of dynamic oxygenation reward evaluation parameters. The control recording module is used to update the parameters of the improved MuZero model and execute it cyclically until the blowing end, generating a dynamic oxygen distribution control record set; The implicit state initialization module includes: An improved MuZero model is constructed, which includes a representation network, an implicit state transition network, a value evaluation network, a policy output network, and a constrained tree search network. In the representation network, the long-term state representation vector of the converter is obtained, a linear mapping is performed on the long-term state representation vector of the converter to generate a long-term evolution prior feature vector, and the long-term evolution prior feature vector is normalized to obtain the long-term evolution prior embedding vector. Within the current control cycle, the current state feature vector is obtained from the converter standardized operation data set, and linear mapping and normalization are performed on the current state feature vector to generate the current state embedding vector. Align the long-term evolution prior embedding vector with the current state embedding vector, and perform prior weighted modulation on each dimension of the current state embedding vector to generate a modulated state embedding vector constrained by the long-term evolution trend. Perform nonlinear transformation and dimensionality compression on the modulation state embedding vector to generate the planning root node state vector; The state vector of the planning root node is input into the implicit state transition network, and the implicit state initialization operation is performed to generate the initial implicit state, which is used as the starting implicit state of the dynamic oxygen allocation control planning process. The reward calculation module includes: Obtain the set of dynamic oxygen supply control commands, and parse the target oxygen supply flow rate set value and the target oxygen gun position set value respectively to generate oxygen supply command parameter value and gun position command parameter value; Write the oxygen supply command parameter value into the oxygen supply regulation interface of the converter oxygen supply control system, trigger the oxygen supply control system to generate an oxygen supply execution signal according to the oxygen supply command parameter value, and form an oxygen supply execution signal value; Write the gun position command parameter value into the gun position control interface of the oxygen lance actuator to trigger the oxygen lance actuator to generate a gun position execution signal according to the gun position command parameter value, thus forming the gun position execution signal value; Under the action of the oxygen supply execution signal value and the oxygen gun position execution signal value, the oxygen supply flow rate adjustment and oxygen gun position adjustment operations are performed, and the execution process is periodically timed to obtain the control cycle time value; After the control cycle timer reaches the end of the control cycle, the execution result data is collected to form an execution result data set; Generate a set of dynamic oxygenation reward evaluation parameters for the current control cycle based on the execution result data set; The set of dynamic oxygenation reward evaluation parameters for the current control cycle is used as the reward calculation parameter combination; The reward value for the current control period is generated based on the combination of reward calculation parameters.
2. The machine learning based dynamic oxygen lancing control system for a converter steelmaking process as claimed in claim 1 wherein, The converter operation data includes oxygen supply flow rate, cumulative oxygen supply, oxygen lance position height, carbon monoxide volume fraction in flue gas at the furnace mouth, carbon dioxide volume fraction in flue gas at the furnace mouth, molten steel temperature, FeO mass fraction in slag, flame image representation at the furnace mouth, and carbon content in molten steel.
3. The machine learning based dynamic oxygen lancing control system for a converter steelmaking process of claim 1, wherein, The preprocessing includes time synchronization, anomaly removal, missing data imputation, and normalization.
4. The dynamic oxygen distribution control system for converter steelmaking based on machine learning according to claim 1, characterized in that, The time series modeling module includes: Based on the set of long-term operation segments of converters, time-series feature construction is performed on the oxygen supply flow rate, cumulative oxygen supply, oxygen lance position height, carbon monoxide volume fraction in flue gas at the furnace mouth, carbon dioxide volume fraction in flue gas at the furnace mouth, molten steel temperature, slag FeO mass fraction, furnace mouth flame image characterization and molten steel carbon content in each long-term operation segment of converters, and multiple corresponding feature vectors are generated. Multiple feature vectors are concatenated to form the converter process state vector corresponding to the long-term operation segment of the converter, and the converter process state vectors constitute the converter process state vector set. The converter process state vector set is input into the Informer network for time series modeling, generating a long-term time series state representation vector of the converter that characterizes the long-term evolution trend of the converter condition.
5. A dynamic oxygen distribution control system for converter steelmaking based on machine learning according to claim 1, characterized in that, The evaluation parameter construction module includes: Based on the endpoint control requirements of converter steelmaking, a target temperature value for molten steel is set, and the molten steel temperature within the corresponding control cycle is obtained during the blowing process. The difference between the molten steel temperature and the target temperature value is calculated, and the difference is transformed by linear mapping to generate a molten steel temperature deviation penalty parameter. Based on the converter's final carbon content control index, a target carbon content value for molten steel is set, and the carbon content of molten steel is obtained within the corresponding control cycle. The deviation between the carbon content of molten steel and the target carbon content value is calculated, and the deviation is transformed by linear mapping to generate a carbon content deviation penalty parameter for molten steel. For the control range of carbon monoxide volume fraction in flue gas at the furnace mouth during converter blowing, a target range of carbon monoxide volume fraction in flue gas is set, and the volume fraction of carbon monoxide in flue gas at the furnace mouth in the corresponding time period is obtained. The degree of deviation from the target range is calculated, and the degree of deviation is linearly mapped to generate a deviation penalty parameter for carbon monoxide volume fraction in flue gas. During the blowing process, the cumulative oxygen supply is statistically analyzed in real time. The current cumulative oxygen supply is compared with the preset cumulative oxygen supply control threshold. Based on the comparison result, a linear mapping is performed to generate a cumulative oxygen supply penalty parameter. Based on the requirements for controlling the degree of slag oxidation, a target range for the FeO mass fraction in slag is set, and the FeO mass fraction in slag is collected within the corresponding control period. The degree of deviation between the FeO mass fraction in slag and the target range is calculated, and the degree of deviation is linearly mapped to the FeO deviation penalty parameter in slag. Based on the requirements for splash risk control in the converter blowing process, the splash risk assessment value is calculated based on the characteristics of the change in the flame image representation, the characteristics of the change in the flue gas composition, and the characteristics of the change in the oxygen supply flow rate. Then, splash risk penalty parameters are generated based on the splash risk assessment value. The parameters for penalizing deviations in molten steel temperature, carbon content, volume fraction of carbon monoxide in flue gas, cumulative oxygen supply, FeO deviation in slag, and splashing risk are combined to form a set of dynamic oxygen allocation reward evaluation parameters.
6. A dynamic oxygen distribution control system for converter steelmaking based on machine learning according to claim 1, characterized in that, The tree search planning module includes: A dynamic oxygen distribution action set is constructed based on the process control requirements of the converter steelmaking process. The dynamic oxygen distribution action set includes a set of oxygen supply flow rate adjustment levels and a set of oxygen lance position adjustment levels, and is associated with oxygen supply intensity constraints, cumulative oxygen supply constraints, lance position adjustment range constraints, and operational safety constraints to form a dynamic oxygen distribution action set with constraints. The initial hidden state is used as the root node hidden state of the constrained tree search network. The initial hidden state is input into the constrained tree search network to complete the initialization of the tree search structure, and the dynamic oxygenation action set is loaded into the corresponding search action space. In the constrained tree search network, starting from the hidden state of the root node, a dynamic oxygenation action that satisfies the constraints is selected, and the dynamic oxygenation action is combined with the hidden state corresponding to the current search node. The combination is then input into the implicit state transition network to perform the hidden state recursion operation, forming a set of hidden states of child nodes. Input the hidden state of each child node in the set of hidden states of the child nodes into the value evaluation network, perform the value evaluation operation, generate the corresponding node value estimate, and bind each node value estimate with the corresponding dynamic oxygenation action and the hidden state of the child node to generate the set of child node evaluation results. Write the set of child node evaluation results into the tree search node structure, and update the node access count and cumulative node value corresponding to the root search node to form the updated node statistics; From the set of hidden states of child nodes, determine the search node of the next search level based on the updated node statistics, and use the determined hidden state of the child node as the hidden state of the current search node of the next search level. Repeat the process of generating dynamic oxygenation actions, generating hidden states, estimating node values, and updating node statistics until the preset number of search levels is reached, generating a multi-level dynamic oxygenation action search tree structure starting from the hidden state of the root node. Perform summary processing on multiple search paths formed by the multi-layer dynamic oxygen allocation action search tree structure, and obtain the node statistics corresponding to the end search node of each search path. The optimal search path is determined based on the node statistics corresponding to the search nodes at the end of each search path. The corresponding oxygen supply flow adjustment sequence and oxygen gun position adjustment sequence are extracted along the optimal search path to generate a multi-step dynamic oxygen distribution action sequence and a corresponding node statistics sequence.
7. A dynamic oxygen distribution control system for converter steelmaking based on machine learning according to claim 1, characterized in that, The instruction generation module includes: Based on the node statistics sequence, determine the target dynamic oxygenation action corresponding to the current control cycle; The target dynamic oxygenation action and the initial hidden state are input into the strategy output network. The strategy output network is then called to perform action mapping calculations to generate a set of dynamic oxygenation control instructions.
8. A dynamic oxygen distribution control system for converter steelmaking based on machine learning according to claim 1, characterized in that, The control recording module includes: Read the reward value corresponding to the current control cycle, and obtain the state representation information and dynamic oxygenation control behavior corresponding to the control cycle. Use the state representation information, dynamic oxygenation control behavior and reward value as training samples to improve the MuZero model. Based on the training samples, parameter update operations are performed on the representation network, implicit state transition network, value evaluation network, and policy output network. After completing the parameter update, the next control cycle begins, and the process of acquiring converter operation data, generating dynamic oxygen distribution control commands, calculating reward values, and updating model parameters is repeated until the converter blowing end point is reached. During the cyclic execution, the set of dynamic oxygenation control instructions, reward values and execution result data corresponding to each control cycle are recorded in the order of the control cycle to form a dynamic oxygenation control record set.