Energy-saving scheduling method for the entire aluminum electrolysis process integrating deep reinforcement learning algorithms

By constructing a spatiotemporally correlated state tensor and introducing a gradient step size decay operator, the parameter oscillation problem of the computing system under strong magnetic field fluctuations was solved, energy-saving scheduling of the entire aluminum electrolysis process was realized, and the control accuracy and energy efficiency of the system in non-stationary environments were improved.

CN122303972APending Publication Date: 2026-06-30HUNAN LIDER INTELLIGENT TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
HUNAN LIDER INTELLIGENT TECH CO LTD
Filing Date
2026-06-02
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Existing computing systems cannot adaptively adjust the reverse gradient step size when processing noisy, high-dimensional operating conditions, resulting in control command discrepancies and increased energy consumption. Furthermore, conventional computing topologies cannot break the rigid binding between logical timing and field sampling, leading to energy loss under strong magnetic field fluctuations.

Method used

By constructing a spatiotemporal correlated state tensor, the spatiotemporal covariance matrix within the sliding window is used to identify out-of-limit noise components. Random impulses are eliminated through spatial orthogonal projection operators, and a spatiotemporal filtered state tensor is generated. The action window span is adjusted by combining the variance of action feature distribution and a gradient step size decay operator is introduced to dynamically smooth the gradient update magnitude and correct the parameters of the policy network and the evaluation network.

Benefits of technology

It improves parameter convergence stability in non-stationary environments, avoids the spread of high-frequency disturbances to the calculation model, improves system processing efficiency and response latency, and enhances the control accuracy and energy efficiency of computer systems under complex working conditions.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122303972A_ABST
    Figure CN122303972A_ABST
Patent Text Reader

Abstract

This invention relates to the field of computer systems based on specific computational models, and discloses an energy-saving scheduling method for the entire aluminum electrolysis process that integrates deep reinforcement learning algorithms. The method includes: acquiring a data stream from a data buffer and constructing a spatiotemporal correlated state tensor; eliminating noise through a sliding window spatiotemporal covariance matrix to generate a filtered state tensor; inputting the data into a policy network, and outputting control commands to a register via forward inference; calculating the variance of the action feature distribution based on discrete control quantities through an endogenous verification program and returning it in situ as a time window trigger condition; calculating the differential reward value based on the global energy efficiency slope and introducing a gradient step size decay operator to correct network parameters. This invention breaks the rigid binding between computation timing and sampling, avoids parameter divergence caused by high-frequency noise, and improves resource scheduling efficiency and convergence stability.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to an energy-saving scheduling method for the entire aluminum electrolysis process that integrates deep reinforcement learning algorithms, belonging to the field of computer system technology based on a specific computational model. Background Technology

[0002] In current industrial production status monitoring and high-frequency energy efficiency scheduling, computer systems based on specific computational models employ a reinforcement learning architecture with a cyclical policy network and an evaluation network. The system receives high-dimensional operating condition sampling data streams and converts them into state tensors. Control instructions are calculated through forward inference and written to registers. Under long-scale monitoring, environmental feedback is used to correct and update parameters, achieving optimal allocation of energy efficiency space. This asynchronous, time-series collaborative serial computation method effectively regulates the coupling of multi-channel variables under steady-state conditions, thereby controlling energy consumption. However, the computational system is not only limited by the core computing power throughput in the underlying processor hardware deployment, but also suffers from shortcomings in the top-level control algorithm. For example, Chinese invention patent application CN111155149A... A smart optimization control platform for aluminum electrolysis based on a digital electrolytic cell was developed. This platform relies on the linear separability of operating conditions under time-domain filtering and uses fixed long and short-time sliding windows and static empirical thresholds to track local alumina concentration and predict anode effects. However, when encountering non-stationary operating conditions caused by strong magnetic field fluctuations, the input data stream is mixed with high-amplitude pulse noise. Rigid time window division and first- or second-order inertial filtering introduce phase lag when filtering out strong random pulses, resulting in characteristic distortion of the extracted current slope and cumulative slope. This causes high-frequency misjudgment or omission of local effect prediction, leading to discrete drift of control commands and accumulation of abnormal control quantities. The inability to adaptively adjust the reverse gradient step size exacerbates the spread of local hot spots and increases the overall energy consumption of the system.

[0003] However, when encountering non-stationary operating conditions caused by strong magnetic field fluctuations, the computational architecture exposes scheduling defects. The input data stream is mixed with a large amount of high-amplitude impulse noise. If the decision cycle is shortened to capture high-frequency transient anomalies, the policy network performs high-frequency optimization within a millisecond-level action window, causing spike noise to directly interfere with the feature extraction process, increasing the discrete variance of the inference output, and generating abnormal control quantity accumulation at the register level, exacerbating the spread of local hot spots. Conversely, if a long monitoring window is maintained to ensure the confidence of environmental information, forward inference generates control delay, which accumulates and spreads in the computational evaluation chain, missing the timing of response current distortion. The systemic increase in power consumption leads to a significant problem. To address this issue, simply expanding hardware storage or increasing processor frequency not only causes a surge in threads and conflicts in the computational pathway, but also fails to smooth out gradient oscillations. Adding a moving average filter introduces phase lag, disrupting the real-time alignment of instruction timing with physical variables. The root cause of this problem is that conventional computational topologies cannot break the rigid binding between inference timing and external sampling. There is a lack of adaptive data flow loops between forward inference feature evolution and backward weight adjustment layers, forcing the system to sacrifice short-cycle inference stability to achieve overall convergence of model training.

[0004] Therefore, how to break the rigid binding between logical timing and field sampling when the computing system processes noisy high-dimensional operating conditions, and how to adaptively adjust the backward gradient update step size to smooth out parameter oscillations while ensuring high real-time feedback of forward inference, has become the technical problem to be solved by this invention. Summary of the Invention

[0005] To address the problems mentioned in the background art, the technical solution of this invention is as follows: A method for energy-saving scheduling of the entire aluminum electrolysis process integrating deep reinforcement learning algorithms, comprising the following steps:

[0006] Step S101: Obtain the state monitoring data stream from the data buffer, combine the distributed temporal feature sequence with the spatial mapping topology relationship and the temporal topology order to construct the spatiotemporal correlated state tensor, use the spatiotemporal covariance matrix in the sliding window to identify the excessive noise component, and eliminate random pulses through the spatial orthogonal projection operator to generate the spatiotemporal filtered state tensor.

[0007] Step S102: Input the spatiotemporal filtering state tensor into the policy network, calculate and output multiple anode height control instructions to the control instruction register through forward inference within the ms-level action window, so as to temporarily store the multiple anode height control instructions.

[0008] Step S103: Read the discrete control quantity of the multi-channel anode height control command in the control command register through the endogenous verification program, calculate the variance of the action feature distribution to characterize the stability of the current forward inference trajectory, and return the variance of the action feature distribution to the ms-level action window in place as the trigger boundary condition for the ms-level action window span adjustment in the next cycle.

[0009] In step S104, within the s-level monitoring window, the evaluation network calculates the differential reward value based on the global energy efficiency slope constructed by weighting the current distribution deviation and tank voltage fluctuation data. In the backpropagation parameter update path, a gradient step size decay operator that is negatively correlated with the variance of the action feature distribution is introduced to dynamically smooth and compensate the gradient update magnitude, thereby correcting the network parameters of the policy network and the evaluation network.

[0010] Preferably, the step S103, which returns the variance of the action feature distribution to the ms-level action window in situ as the trigger boundary condition for adjusting the span of the ms-level action window in the next cycle, includes: if the variance of the action feature distribution exceeds the first stability threshold, then the span of the ms-level action window is reduced to 10ms; if the variance of the action feature distribution is less than or equal to the first stability threshold, then the span of the ms-level action window is expanded to 50ms.

[0011] Preferably, step S101, which involves acquiring the state monitoring data stream from the data buffer and constructing the spatiotemporal correlated state tensor, includes the following sub-steps: Step S1011, reading the distributed temporal feature sequence of the multi-channel physical state monitoring nodes from the data buffer; Step S1012, combining the distributed temporal feature sequence according to the spatial mapping topology and temporal topology to generate a high-dimensional state matrix; Step S1013, calibrating the high-dimensional state matrix through spatiotemporal alignment to transform and synthesize the high-dimensional state matrix into a spatiotemporal correlated state tensor with a unified timestamp and spatial coordinates.

[0012] Preferably, step S104, which involves calculating the differential reward value using the global energy efficiency slope constructed by weighting the current distribution deviation and tank voltage fluctuation data through the evaluation network, includes the following sub-steps: obtaining the current distribution data and tank voltage fluctuation data of the current scheduling cycle in real time from the controlled object data interface; calculating the current distribution deviation based on the current distribution data, and calculating the global energy efficiency slope by differential comparison with the tank voltage fluctuation data and the preset energy efficiency benchmark; multiplying the current distribution deviation and the global energy efficiency slope by constant weight coefficients respectively and summing them to generate the differential reward value.

[0013] Preferably, step S104, which introduces a gradient step size decay operator negatively correlated with the variance of the action feature distribution in the backpropagation parameter update path to compensate for the gradient update magnitude, includes the following sub-steps: inputting the variance of the action feature distribution into the update change monitoring unit, determining whether the variance of the action feature distribution exceeds the second stability threshold; when the variance of the action feature distribution exceeds the second stability threshold, calculating the ratio of the variance of the action feature distribution to the second stability threshold, and constructing the gradient step size decay operator using the reciprocal of the ratio; in the backpropagation parameter update calculation, multiplying the original update gradient step size of the network parameters by the gradient step size decay operator to reduce the update step size.

[0014] Preferably, in step S104, the evaluation network calculates the differential reward value based on the global energy efficiency slope constructed by weighting the current distribution deviation and tank voltage fluctuation data. The evaluation network is updated within an s-level monitoring window with a time span of s, and the time span of the s-level monitoring window is greater than the time span of the ms-level action window.

[0015] Preferably, after the step of outputting the multi-channel anode height control command to the control command register in step S102, the method further includes: outputting the variance of the action feature distribution calculated based on the multi-channel anode height control command as a feature observation data stream to the monitoring terminal.

[0016] Preferably, step S101, which involves identifying the out-of-limit noise component using the spatiotemporal covariance matrix within a sliding window, eliminating random pulses using a spatial orthogonal projection operator, and generating a spatiotemporal filtered state tensor, includes the following sub-steps: Step S1014: Calculate the spatiotemporal covariance matrix of the spatiotemporal correlated state tensor within a sliding window formed by historical sampling periods; Step S1015: Extract the eigenvalues ​​of the spatiotemporal covariance matrix, and determine the out-of-limit noise component when the eigenvalues ​​exceed the variance boundary threshold; Step S1016: Construct a spatial orthogonal projection operator orthogonal to the out-of-limit noise component, and project the spatiotemporal correlated state tensor onto an orthogonal subspace orthogonal to the out-of-limit noise component.

[0017] Preferably, the full-process energy-saving scheduling method runs in a distributed bus network environment, deploying data buffers, policy networks, control command registers, and evaluation networks in distributed computing nodes, and transmitting status monitoring data streams and multiple anode height control commands through data transfer between distributed computing nodes.

[0018] Compared with the prior art, the beneficial effects of the present invention are:

[0019] 1. In the energy-saving scheduling of the entire aluminum electrolysis process, the strategy network extracts the deep value estimation operator within the millisecond-level action window to retain the forward inference path separately. It calls the instantaneous current distribution characteristics in the state tensor and calculates the lower-level adjustment quantity, thereby reconstructing the instruction distribution timing of the computer system. Furthermore, it combines the real-time variance statistics of multiple control quantities in the instruction register to reversely regulate the compression and expansion boundaries of the action window in subsequent cycles. This establishes a two-way data dependency closed loop between the originally fragmented time scheduling steps and the inference trajectory inside the network. From the perspective of computing architecture, it avoids the computing power allocation conflict caused by high-dimensional feature inputs within the computing structure, and achieves an intrinsic leap in system processing efficiency and model inference response latency.

[0020] 2. The evaluation network calculates the differential reward value based on the estimated DC power consumption per ton of aluminum and the average current value of multiple channels within a second-level monitoring window. In the backpropagation update path of the network model parameters that uses this reward value to correct the strategy, a gradient step size decay operator that is negatively correlated with the variance of the action feature distribution transmitted in real time in the previous step is introduced simultaneously to implement dynamic hedging compensation for the gradient update amplitude. When the variance increases abnormally due to sudden random impulse noise, the update step size is automatically reduced to block the diffusion of high-frequency disturbances to the network parameter evolution layer. This solves the control command discrete drift of the computation model under complex heterogeneous operating conditions and improves the parameter convergence stability of the deep reinforcement learning model under non-stationary environmental noise.

[0021] 3. Multiple control commands are temporarily stored and distributed through the control command register, and a verification program is simultaneously started to transmit the action feature distribution variance data stream to the subsequent stage in real time. This not only provides an instant and transparent information interaction link for high-dimensional data processing, but also provides digital observation basis for on-site operation and maintenance control. This enables the computational optimization trajectory of the deep learning model to have traceable white-box self-checking characteristics, thereby enhancing the smoothness of human-machine collaboration between the industrial edge computing center and the distributed physical state monitoring nodes. The underlying data of the preceding and following steps interweave to form a time-series closed loop, demonstrating the engineering feasibility and industrial application value of this computing solution in complex strong magnetic field environments. Attached Figure Description

[0022] Figure 1 This is a flowchart illustrating the steps of the energy-saving scheduling method for the entire aluminum electrolysis process that integrates deep reinforcement learning algorithms, as described in this invention.

[0023] Figure 2 This is a structural diagram of the energy-saving scheduling system for the entire aluminum electrolysis process that integrates deep reinforcement learning algorithms, as described in this invention.

[0024] The objectives, features, and advantages of this invention will be further explained in conjunction with the embodiments and with reference to the accompanying drawings. Detailed Implementation

[0025] The technical solutions of the embodiments of this application will be clearly described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, not all embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of this application are within the scope of protection of this application.

[0026] A method for energy-saving scheduling of the entire aluminum electrolysis process integrating deep reinforcement learning algorithms includes the following steps:

[0027] Step S101: Obtain the state monitoring data stream from the data buffer, combine the distributed temporal feature sequence with the spatial mapping topology relationship and the temporal topology order to construct the spatiotemporal correlated state tensor, use the spatiotemporal covariance matrix in the sliding window to identify the excessive noise component, and eliminate random pulses through the spatial orthogonal projection operator to generate the spatiotemporal filtered state tensor.

[0028] Step S102: Input the spatiotemporal filtering state tensor into the policy network, calculate and output multiple anode height control instructions to the control instruction register through forward inference within the ms-level action window, so as to temporarily store the multiple anode height control instructions.

[0029] Step S103: Read the discrete control quantity of the multi-channel anode height control command in the control command register through the endogenous verification program, calculate the variance of the action feature distribution to characterize the stability of the current forward inference trajectory, and return the variance of the action feature distribution to the ms-level action window in place as the trigger boundary condition for the ms-level action window span adjustment in the next cycle.

[0030] In step S104, within the s-level monitoring window, the evaluation network calculates the differential reward value based on the global energy efficiency slope constructed by weighting the current distribution deviation and tank voltage fluctuation data. In the backpropagation parameter update path, a gradient step size decay operator that is negatively correlated with the variance of the action feature distribution is introduced to dynamically smooth and compensate the gradient update magnitude, thereby correcting the network parameters of the policy network and the evaluation network.

[0031] Preferably, the step S103, which returns the variance of the action feature distribution to the ms-level action window in situ as the trigger boundary condition for adjusting the span of the ms-level action window in the next cycle, includes: if the variance of the action feature distribution exceeds the first stability threshold, then the span of the ms-level action window is reduced to 10ms; if the variance of the action feature distribution is less than or equal to the first stability threshold, then the span of the ms-level action window is expanded to 50ms.

[0032] Preferably, the step S101 of obtaining the state monitoring data stream from the data buffer and constructing the spatiotemporal correlated state tensor includes the following sub-steps: Step S1011: Read the distributed temporal feature sequence of the multi-channel physical state monitoring nodes from the data buffer; Step S1012: Combine the distributed temporal feature sequence according to the spatial mapping topology and temporal topology order to generate a high-dimensional state matrix; Step S1013: Align the high-dimensional state matrix with spatiotemporal alignment to transform and synthesize the high-dimensional state matrix into a spatiotemporal correlated state tensor with a unified timestamp and spatial coordinates.

[0033] Preferably, step S104, which involves calculating the differential reward value using the global energy efficiency slope constructed by weighting the current distribution deviation and tank voltage fluctuation data through the evaluation network, includes the following sub-steps: obtaining the current distribution data and tank voltage fluctuation data of the current scheduling cycle in real time from the controlled object data interface; calculating the current distribution deviation based on the current distribution data, and calculating the global energy efficiency slope by differential comparison with the tank voltage fluctuation data and the preset energy efficiency benchmark; multiplying the current distribution deviation and the global energy efficiency slope by constant weight coefficients respectively and summing them to generate the differential reward value.

[0034] Preferably, step S104, which introduces a gradient step size decay operator negatively correlated with the variance of the action feature distribution in the backpropagation parameter update path to compensate for the gradient update magnitude, includes the following sub-steps: inputting the variance of the action feature distribution into the update change monitoring unit, determining whether the variance of the action feature distribution exceeds the second stability threshold; when the variance of the action feature distribution exceeds the second stability threshold, calculating the ratio of the variance of the action feature distribution to the second stability threshold, and constructing the gradient step size decay operator using the reciprocal of the ratio; in the backpropagation parameter update calculation, multiplying the original update gradient step size of the network parameters by the gradient step size decay operator to reduce the update step size.

[0035] Preferably, in step S104, the evaluation network calculates the differential reward value based on the global energy efficiency slope constructed by weighting the current distribution deviation and tank voltage fluctuation data. The evaluation network is updated within an s-level monitoring window with a time span of s, and the time span of the s-level monitoring window is greater than the time span of the ms-level action window.

[0036] Preferably, after the step of outputting the multi-channel anode height control command to the control command register in step S102, the method further includes: outputting the variance of the action feature distribution calculated based on the multi-channel anode height control command as a feature observation data stream to the monitoring terminal.

[0037] Preferably, step S101, which involves identifying the out-of-limit noise component using the spatiotemporal covariance matrix within a sliding window, eliminating random pulses using a spatial orthogonal projection operator, and generating a spatiotemporal filtered state tensor, includes the following sub-steps: Step S1014: Calculate the spatiotemporal covariance matrix of the spatiotemporal correlated state tensor within a sliding window formed by historical sampling periods; Step S1015: Extract the eigenvalues ​​of the spatiotemporal covariance matrix, and determine the out-of-limit noise component when the eigenvalues ​​exceed the variance boundary threshold; Step S1016: Construct a spatial orthogonal projection operator orthogonal to the out-of-limit noise component, and project the spatiotemporal correlated state tensor onto an orthogonal subspace orthogonal to the out-of-limit noise component.

[0038] Preferably, the full-process energy-saving scheduling method runs in a distributed bus network environment, deploying data buffers, policy networks, control command registers, and evaluation networks in distributed computing nodes, and transmitting status monitoring data streams and multiple anode height control commands through data transfer between distributed computing nodes.

[0039] Example 1: The current electrolytic cell serves as the main production unit. High-precision Hall sensors deployed on each branch of the busbar collect 24 channels of anode current sequence data. During continuous system operation, the acquisition unit acquires current data at a frequency of 1000Hz, while simultaneously monitoring the liquid level. The system converts the original analog signals to digital signals and constructs a raw matrix containing 24 timing channels in the background memory. When constructing the spatiotemporal correlated state tensor, to address the inconsistency between the high-frequency anode current monitoring flow and the low-frequency tank operating environment characteristic data on the acquisition time axis, the preprocessing unit performs forward zero-order hold interpolation under a unified global clock stamp to align the discrete data stream to a unified time sampling beat. Furthermore, a homogeneous transformation matrix based on the tank's physical coordinate system is introduced to normalize the spatial coordinates of the multi-channel nodes, thereby calibrating the original matrix into a correlated state tensor with a unified spatiotemporal coordinate dimension. To eliminate electromagnetic pulse noise caused by strong magnetic fields, the system calculates the spatiotemporal covariance matrix of the current in each channel within the sliding window. Its definition is as follows: ,in, for The spatiotemporal state vector at time t. This represents the number of sampling points within the sliding window. The mean vector of the state vector is extracted by the computational unit. eigenvalues ,when Exceeding the preset variance boundary threshold At that time, the corresponding spatial component is determined to be excessive noise, and the computing unit uses the spatial orthogonal projection operator. Map the data stream to a noisy orthogonal subspace to generate a spatiotemporal filtering state tensor. ,in This represents the unit eigenvector corresponding to the noise component. It is an identity matrix.

[0040] The computing unit dynamically defines the asynchronous computing window based on the stability of the current operating conditions, and performs actions within a millisecond-level window. Internally, execute direct network calls. Forward inference is performed, outputting multiple anode height control commands to the control command register. Specifically, although forward inference generates discrete control commands at high frequency within a millisecond-level action window, the control command register internally uses a sliding first-in-first-out (FIFO) queue as a digital buffer interface to convert them into a time-smooth data flow sequence. This allows for the natural reception of subsequent first-order low-pass digital filtering down-conversion and modulation actions at the data distribution boundary. This bridges the spatiotemporal gap between the high-frequency algorithm calculation step size and the second-level physical large-inertia actuator from an architectural design perspective, avoiding frequent mechanical oscillations of the physical mechanism. The control command register connects to an external programmable logic controller. The anode lifting mechanism of the industrial aluminum electrolysis cell exhibits mechanical inertia, and the action calculation frequency... There is a spatiotemporal scale difference between the rate and the physical response frequency of the lower-level host. Based on the signal smoothing modulation principle in automatic control, the control instruction register converts the temporarily stored millisecond-level discrete control quantity into a continuous voltage control signal with a second-level period through a first-order low-pass digital filter. The programmable logic controller receives the continuous voltage control signal and converts it into drive pulses, which are sent to the servo motor driver in the anode lifting mechanism to drive the servo motor to rotate and change the physical displacement of the anode guide rod. The time constant of the first-order low-pass digital filter is set to 1.5s to 2.5s to match the mechanical start hysteresis of the anode lifting mechanism. In order to achieve closed-loop monitoring of the inference trajectory, the system forcibly executes the intrinsic verification program and reads the discrete control quantity in the register. And calculate the variance of the action feature distribution. : ,in, Anode height control commands for 24 channels in The algebraic average at time , when Exceeding the first stability threshold At that time, the computing unit will calculate the next cycle's... Compressed to 10ms to increase instruction update frequency; when Less than or equal to At that time, Extended to 50ms to reduce computational load, first stability threshold. The numerical selection affects the sensitivity of the action window adjustment; the first stability threshold of this invention... The working range is 0.05 to 0.15.

[0041] In this embodiment, to determine specific values, the impact of different thresholds on tank stability was examined. Experimental results show that when the first stability threshold is reached... When set to 0.05, the instruction perturbation trigger window shrinks frequently, leading to exhaustion of processor computing resources; when the first stability threshold is reached... When set to 0.15, window adjustment lags, failing to promptly suppress current deviation trends; when the first stability threshold... When set to 0.10, the switching frequency of the action window achieves a dynamic balance with the magnetic field fluctuation period of the aluminum electrolysis cell, resulting in minimal anode current fluctuation. Therefore, 0.10 is determined as the preferred parameter in this embodiment. Similarly, the second stability threshold... The operating range is 0.10 to 0.30. Experimental data shows that when the second stability threshold is reached... When set to 0.20, the gradient update step size during backpropagation decays smoothly, effectively preventing convergence and divergence of network parameters due to high-frequency noise interference. Within a second-level monitoring window... Internally, the evaluation network acquires current distribution data through the controlled object data interface. Combined with slot voltage fluctuation data, the global energy efficiency slope was calculated. : ,in, for Estimated DC power consumption per ton of aluminum at any given time. The time interval for monitoring the current distribution deviation The calculation is as follows: ,in, For 24 channels The system ultimately generates a differential reward value based on the average current value at each time step. During backpropagation, the computational unit introduces a gradient step size decay operator. ,in The second stability threshold is when At that time, the original update gradient step size of the policy network and the evaluation network is multiplied by... Adjustments are made to maintain parameter convergence stability under non-stationary electromagnetic noise environments, ultimately ensuring that the standard deviation of the anode current in each channel remains below 1.5% over the long term. Specifically, to ensure the dimensional consistency of network parameter updates in the backpropagation path, a second stability threshold is set. The physical dimension is set to square millimeters ( ), and the variance of the action feature distribution The displacement square dimensions are the same, thus making the gradient step size decay operator calculated by the ratio equal to the dimensionless displacement square dimension. The coefficients are converted into dimensionless pure numerical scalar coefficients, which can be directly multiplied with the original update gradient step size of the network parameters, thereby achieving dimensional consistency and smooth alignment from physical displacement control quantities to the pure numerical algorithm update level.

[0042] Example 2: To verify the actual effectiveness of the method of the present invention in energy-saving scheduling of the entire aluminum electrolysis process, a set of comparative experiments was designed to verify the technical differences between the sample group of the present invention and the control sample group under different operating conditions. The experimental platform adopts a distributed industrial computing architecture, including a 24-channel anode current data synchronous acquisition module, a real-time electrolytic cell energy efficiency monitoring terminal, and an edge computing scheduling server. The physical boundary conditions preset by the system are as follows: the rated operating current of a single electrolytic cell is 350kA, the cell voltage fluctuation range is 3.9V to 4.2V, the aluminum liquid quality level is maintained at 20cm to 25cm, and the electrolyte superheat is controlled at 10. From ℃ to 15℃, the noise source introduced in this experiment was set as follows: Gaussian white noise superimposed on the electrolytic current, with a noise power density set to -80dBm / Hz, and 50Hz power frequency harmonic interference was simulated. The comparison sample group used traditional PID control logic, with a fixed scheduling period of 100ms. When dealing with the above interference conditions, due to the timing binding of the sampling frequency and the feedback loop, the PID control logic caused frequent and ineffective mechanical oscillations in the anode lifting mechanism when facing electromagnetic pulse noise, resulting in anode current range fluctuation of up to 35kA. The sample group of this invention uses asynchronous dual-time-window calculation logic, with a millisecond-level action window. Inside, the computation unit calculates based on the filtered state tensor. The system directly infers and outputs multiple anode height control commands. To verify stability under boundary conditions, an extreme anode current input value is set in the experiment. When an abnormal disturbance trend is detected, the system automatically triggers the calculation window compression logic. The experimental data is recorded as follows: when the variance of the action feature distribution... When the value is increased from 0.02 to 0.15, the system will... The automatic adjustment from 50ms to 10ms effectively avoids adjustment lag. During the performance gradient verification phase, the deviation of different current distributions is tested. Energy efficiency control capability under certain conditions, when The DC power consumption per ton of aluminum in the sample group of this invention was measured at deviations of 0.02 (low deviation), 0.08 (medium deviation), and 0.15 (high deviation), respectively. The energy utilization efficiency was improved by maintaining at 12.5 MWh / t, 12.8 MWh / t and 13.2 MWh / t respectively, compared with the control group's 13.2 MWh / t, 13.9 MWh / t and 14.8 MWh / t.

[0043] To verify the validity of the boundary values, a three-point support test group was set up: gradient step size decay operator. Value is 0.05 (lower limit): Current distribution deviation The convergence speed decreased, and the system experienced overshoot when facing sudden operating conditions, causing the DC power consumption per ton of aluminum to rise back to 14.2 MWh / t. The gradient step size decay operator... A value of 0.50 (normal median) indicates that the system's parameters update smoothly under noise fluctuations, and the DC power consumption per ton of aluminum remains stable at 12.8 MWh / t, with the gradient step size decay operator... A value of 0.95 (upper limit): Due to overly aggressive parameter updates, the gradient of the network output oscillates under electromagnetic pulse noise interference, leading to divergence in energy efficiency control. The DC power consumption per ton of aluminum fluctuates to 15.1 MWh / t. In this performance gradient and boundary stability verification experiment, to quantitatively test the limiting influence of a fixed attenuation intensity on the overall network convergence trend, the gradient step size attenuation operator... Instead of its inherent adaptive dynamic calculation logic, the constant static values ​​of 0.05, 0.50, and 0.95 were manually hard-coded as independent control variable groups. These were used as performance benchmarks under extreme conditions to demonstrate the engineering necessity of using an adaptive formula to dynamically adjust the step size in noisy environments to suppress parameter divergence. The above experiments show that the gradient step size decay operator... Setting the value within the range of 0.1 to 0.8 can synergistically suppress noise disturbances and achieve smooth convergence of the reverse gradient. Experimental results confirm that, through the variance of the action feature distribution... Real-time control of the asynchronous computation window duration and the backpropagation gradient update step size can decouple the rigid timing of physical sampling and scheduling execution from the computational architecture level, thereby enabling the adjustment of the current distribution in the aluminum electrolysis cell.

[0044] Example 3: In an aluminum electrolysis production scheduling scenario, the system faces the problem of abnormal fluctuations in anode current data caused by complex electromagnetic interference. The distributed acquisition terminal of the electrolytic cell synchronously acquires the current signal at a sampling frequency of 2000Hz and sends it to the preprocessing unit for noise reduction. The preprocessing unit calculates the first-order difference value of the current signal. ,when Exceeding the preset current slope limit When the sampling point is identified as electromagnetic pulse noise, linear interpolation is used to replace it with the current values ​​from before and after the sampling point, ensuring the state tensor of the input policy network is stable. The numerical stability of the policy network is based on the constructed filter state tensor. Perform forward inference within a millisecond-level action window. Inside, the policy network will The image is a sequence of anode height control commands. To address the logical black-box problem of control commands, the system introduces the variance of action feature distribution. The calculation logic is as follows, serving as the criterion for determining the stability of the strategy: ,in, For the first The channel's anode height control command is in Discrete control quantity at time points. Anode height control commands for 24 channels in The algebraic average at time , when Exceeding the first stability threshold When the system determines that the current strategy output trajectory has logical discreteness, it triggers the action window reduction logic, which will... The ms time was reduced from 50ms to 10ms, increasing the frequency of instruction closed-loop and correcting control deviations.

[0045] In the parameter update chain, to avoid gradient divergence during the backpropagation phase, the computation unit calculates the variance of the action feature distribution. Dynamic calculation of gradient decay operator The calculation formula for this operator is as follows: ,in, This is the second stability threshold. To prevent division by zero, the constant is set to 1e-6. When it increases, By reducing the gradient and forcing it to compress the update gradient, this gradient hedging strategy based on inference trajectory stability can control the standard deviation of the anode current in each channel within 0.8kA under high-interference environments. This experimental design includes a control group to verify the above boundary conditions, using the gradient step size decay operator. When the value is 0.05 (lower limit), the parameter convergence speed decreases, causing the system to overshoot when facing sudden operating conditions. The DC power consumption per ton of aluminum rises back to 14.2 MWh / t, and the gradient step size decay operator... When the value is 1.20 (upper limit), the gradient overshoots under disturbance, causing the anode current range fluctuation to increase to 22kA. Data confirms that the gradient step size decay operator... The value is limited to between 0.08 and 1.00, which can synergistically achieve the suppression of noise disturbances and the smooth convergence of the reverse gradient. The above process shows that the system passes the current slope limit. First stability threshold and the second stability threshold The system controls and constructs an energy-saving scheduling logic that strongly binds deep reinforcement learning algorithms to the physical characteristics of the industrial site. The system operates within a millisecond-level action window. Fluctuations and second-level monitoring windows The data processing path below transforms the abstract energy-saving strategy into specific anode displacement commands and parameter weight corrections, realizing closed-loop control of the computational model reasoning logic and the physical production process of electrolytic aluminum.

[0046] Example 4: In the offline deployment and parameter optimization stage of the aluminum electrolysis cell, to ensure the stability of the deep reinforcement learning algorithm's scheduling logic under different cell conditions, raw material composition fluctuations, and electrolyte composition evolution, the system executes a pre-parameter self-calibration and benchmark reference matrix construction procedure. The system collects data streams under steady-state operation of the electrolysis cell, including key physical parameters such as cell temperature, aluminum liquid level, anode current, electrode distance fluctuation, cell voltage, and alumina concentration. The system calculates the average value of each parameter over a continuous operating cycle. with standard deviation Construct a benchmark feature matrix that reflects the basic operating status of the electrolyzer. This provides a deterministic reference benchmark for subsequent real-time parameter deviation evaluation. Before the algorithm scheduling model is put into operation, the system executes a parameter adaptive mapping process for unsteady-state conditions. When the electrolytic cell is in unsteady-state conditions such as electrode switching, aluminum tapping, or feeding, the system suspends the direct inference of the deep reinforcement learning policy network and forcibly switches to the preset physical rule execution mode to ensure that the action output is within the physical feasible domain. During this process, the system synchronously collects the input state vector and the physical response data of the actuator during the operation. The data stream is then input into the evaluation network to update its value estimation function for the transitional operating condition. The iterative update formula for the value function is as follows: ,in, This is the updated action value estimate. The estimated value of the action at the previous moment. This is the model learning rate, ranging from 0.01 to 0.05. The difference is optimized based on the actual observed energy consumption.

[0047] After completing the model parameter iteration, the system performs a boundary stability assessment to test the model's deviation characteristics. The adjustment stability under operating conditions with values ​​of 0.1, 0.5, and 1.2, when At that time, the system maintains the reinforcement learning scheduling strategy, and the standard deviation of the anode current of each channel remains within 1.5%. At this time, the system executes weight compensation logic, reducing the nonlinear activation weights during policy network inference according to preset mapping rules, and increasing the control ratio of physical rule constraints to ensure that system output fluctuations remain within production safety limits. At that time, the scheduling system triggers the safety protection mechanism, automatically locking the electrolytic cell operation mode to constant current control and issuing a parameter recalibration warning to the terminal. The aforementioned engineering specifications clearly define the complete causal chain from electromagnetic noise preprocessing and asynchronous task scheduling to backward gradient smooth convergence, and utilize the gradient decay operator. The dynamic calibration logic realizes the suppression of parameter divergence under non-stationary operating conditions, and verifies the physical implementation basis of the calculation model in the energy-saving scheduling of the entire aluminum electrolysis process.

[0048] Example 5: During the offline deployment and parameter optimization phase of the aluminum electrolysis cell, to ensure the stability of the deep reinforcement learning algorithm's scheduling logic under different cell conditions, raw material composition fluctuations, and electrolyte composition evolution, the system executes a pre-parameter self-calibration and benchmark reference matrix construction procedure. The system collects data streams under steady-state operation of the electrolysis cell, including key physical parameters such as cell temperature, aluminum liquid level, anode current, electrode distance fluctuation, cell voltage, and alumina concentration. The system calculates the average value of each parameter over a continuous operating cycle. with standard deviation Construct a benchmark feature matrix that reflects the basic operating status of the electrolyzer. This provides a deterministic reference benchmark for subsequent real-time parameter deviation evaluation. Before the algorithm scheduling model is put into operation, the system executes a parameter adaptive mapping process for unsteady-state conditions. When the electrolytic cell is in unsteady-state conditions such as electrode switching, aluminum tapping, or feeding, the system suspends the direct inference of the deep reinforcement learning policy network and forcibly switches to the preset physical rule execution mode to ensure that the action output is within the physical feasible domain. During this process, the system synchronously collects the input state vector and the physical response data of the actuator during the operation. The data stream is then input into the evaluation network to update its value estimation function for the transitional operating condition. The iterative update formula for the value function is as follows: ,in, This is the updated action value estimate. The estimated value of the action at the previous moment. This is the model learning rate, ranging from 0.01 to 0.05. The difference is optimized based on the actual observed energy consumption.

[0049] This embodiment, based on the above embodiments, further considers the slow drift of the cell condition baseline during continuous operation of aluminum electrolysis. To ensure the long-term collaborative stability of each component in the large-capacity distributed computing node, the system introduces a time evolution update mechanism. Based on the time-weighted decay theory in statistics, the system introduces a time decay factor into the data within the sliding window formed by historical sampling periods. The value range of the time decay factor is 0.95 to 0.98, which is used to reduce the weight of early historical feature data and eliminate outdated operating condition data. When the actual observed energy consumption optimization difference output by the evaluation network is lower than the preset benefit threshold for 24 consecutive hours, the system issues a reconstruction command to re-collect steady-state operating condition data and refresh the baseline feature matrix. Under the condition of clustered deployment of electrolytic cells, the central control unit adopts a round-robin allocation method, allocating computing overhead in descending order of the variance of the action feature distribution fed back by each distributed computing node, prioritizing the scheduling of nodes with excessive variance, and realizing adaptive balanced allocation of edge computing resources.

[0050] It will be apparent to those skilled in the art that the present invention is not limited to the details of the exemplary embodiments described above, and that the present invention can be implemented in other specific forms without departing from the spirit or essential characteristics of the present invention.

[0051] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and are not intended to limit it. Although the present invention has been described in detail with reference to preferred embodiments, those skilled in the art should understand that modifications or equivalent substitutions can be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims

1. A method for energy-saving scheduling of the entire aluminum electrolysis process integrating deep reinforcement learning algorithms, characterized in that, Includes the following steps: Step S101: Obtain the state monitoring data stream from the data buffer, combine the distributed temporal feature sequence with the spatial mapping topology relationship and the temporal topology order to construct the spatiotemporal correlated state tensor, use the spatiotemporal covariance matrix in the sliding window to identify the excessive noise component, and eliminate random pulses through the spatial orthogonal projection operator to generate the spatiotemporal filtered state tensor. Step S102: Input the spatiotemporal filtering state tensor into the policy network, calculate and output multiple anode height control instructions to the control instruction register through forward inference within the ms-level action window, so as to temporarily store the multiple anode height control instructions. Step S103: Read the discrete control quantity of the multi-channel anode height control command in the control command register through the endogenous verification program, calculate the variance of the action feature distribution to characterize the stability of the current forward inference trajectory, and return the variance of the action feature distribution to the ms-level action window in place as the trigger boundary condition for the ms-level action window span adjustment in the next cycle. In step S104, within the s-level monitoring window, the evaluation network calculates the differential reward value based on the global energy efficiency slope constructed by weighting the current distribution deviation and tank voltage fluctuation data. In the backpropagation parameter update path, a gradient step size decay operator that is negatively correlated with the variance of the action feature distribution is introduced to dynamically smooth and compensate the gradient update magnitude, thereby correcting the network parameters of the policy network and the evaluation network.

2. The energy-saving scheduling method for the entire aluminum electrolysis process integrating deep reinforcement learning algorithms according to claim 1, characterized in that, The step S103, which returns the variance of the action feature distribution to the ms-level action window in situ as the trigger boundary condition for adjusting the span of the ms-level action window in the next cycle, includes: if the variance of the action feature distribution exceeds the first stability threshold, the span of the ms-level action window is reduced to 10ms; if the variance of the action feature distribution is less than or equal to the first stability threshold, the span of the ms-level action window is expanded to 50ms.

3. The energy-saving scheduling method for the entire aluminum electrolysis process integrating deep reinforcement learning algorithms according to claim 1, characterized in that, The step S101, which involves acquiring the status monitoring data stream from the data buffer and constructing the spatiotemporal correlated state tensor, includes the following sub-steps: Step S1011, reading the distributed temporal feature sequence of the multi-channel physical status monitoring nodes from the data buffer; Step S1012, combining the distributed temporal feature sequence according to the spatial mapping topology and temporal topology to generate a high-dimensional state matrix; Step S1013, calibrating the high-dimensional state matrix through spatiotemporal alignment to transform and synthesize the high-dimensional state matrix into a spatiotemporal correlated state tensor with a unified timestamp and spatial coordinates.

4. The energy-saving scheduling method for the entire aluminum electrolysis process integrating deep reinforcement learning algorithms according to claim 1, characterized in that, Step S104, which involves calculating the differential reward value through the evaluation network based on the global energy efficiency slope constructed by weighting the current distribution deviation and the tank voltage fluctuation data, includes the following sub-steps: real-time acquisition of current distribution data and tank voltage fluctuation data for the current scheduling period from the controlled object data interface; The current distribution deviation is calculated based on the current distribution data, and the global energy efficiency slope is calculated by differential comparison with the tank voltage fluctuation data and the preset energy efficiency benchmark. The current distribution deviation and the global energy efficiency slope are multiplied by constant weight coefficients and summed to generate a differential reward value.

5. The energy-saving scheduling method for the entire aluminum electrolysis process integrating deep reinforcement learning algorithms according to claim 1, characterized in that, In step S104, a gradient step size decay operator negatively correlated with the variance of the action feature distribution is introduced into the backpropagation parameter update path to compensate for the gradient update magnitude. The steps include the following sub-steps: inputting the variance of the action feature distribution into the update change monitoring unit and determining whether the variance of the action feature distribution exceeds the second stability threshold; when the variance of the action feature distribution exceeds the second stability threshold, calculating the ratio of the variance of the action feature distribution to the second stability threshold, and constructing the gradient step size decay operator with the reciprocal of the ratio; in the backpropagation parameter update calculation, multiplying the original update gradient step size of the network parameters by the gradient step size decay operator to reduce the update step size.

6. The energy-saving scheduling method for the entire aluminum electrolysis process integrating deep reinforcement learning algorithms according to claim 1, characterized in that, In step S104, the evaluation network calculates the differential reward value based on the global energy efficiency slope constructed by weighting the current distribution deviation and tank voltage fluctuation data. The evaluation network is updated within an s-level monitoring window with a time span of s, and the time span of the s-level monitoring window is greater than the time span of the ms-level action window.

7. The energy-saving scheduling method for the entire aluminum electrolysis process integrating deep reinforcement learning algorithms according to claim 1, characterized in that, After step S102, which involves outputting multiple anode height control commands to the control command register, the method further includes: outputting the variance of the action feature distribution calculated based on the multiple anode height control commands as a feature observation data stream to the monitoring terminal.

8. The energy-saving scheduling method for the entire aluminum electrolysis process integrating deep reinforcement learning algorithms according to claim 1, characterized in that, The steps in step S101, which involve identifying out-of-limit noise components using the spatiotemporal covariance matrix within a sliding window, eliminating random impulses using a spatial orthogonal projection operator, and generating a spatiotemporal filtered state tensor, include the following sub-steps: Step S1014: Calculate the spatiotemporal covariance matrix of the spatiotemporal correlated state tensor within a sliding window formed by historical sampling periods; Step S1015: Extract eigenvalues ​​from the spatiotemporal covariance matrix and determine out-of-limit noise components when the eigenvalues ​​exceed the variance boundary threshold; Step S1016: Construct a spatial orthogonal projection operator orthogonal to the out-of-limit noise components and project the spatiotemporal correlated state tensor onto an orthogonal subspace orthogonal to the out-of-limit noise components.

9. The energy-saving scheduling method for the entire aluminum electrolysis process integrating deep reinforcement learning algorithms according to claim 1, characterized in that, The end-to-end energy-saving scheduling method operates in a distributed bus network environment, deploying data buffers, policy networks, control command registers, and evaluation networks in distributed computing nodes. It transmits status monitoring data streams and multiple anode height control commands through data flow between distributed computing nodes.