AI enterprise energy consumption optimization management system and method based on data mining
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SUZHOU GONGYING INTERNET TECH CO LTD
- Filing Date
- 2026-05-19
- Publication Date
- 2026-06-19
Smart Images

Figure CN122242875A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the fields of artificial intelligence, data mining, and enterprise energy management, specifically to an AI-based enterprise energy consumption optimization management system and method based on data mining. Background Technology
[0002] In enterprise production parks, especially in continuous production manufacturing scenarios, multiple energy-consuming units such as cold stations, air compressor stations, clean air conditioning, energy storage equipment, and key process loads are coupled with each other. Their operating status has time-series fluctuations and linkage characteristics with a rate of change exceeding the preset fluctuation threshold. Therefore, fine-tuning of enterprise energy systems is an important foundation for ensuring stable energy supply and reducing overall energy consumption.
[0003] Existing energy management methods have many problems. For example, they often use scheduling methods based on fixed rules, single load thresholds, or static experience parameters. Although these methods can complete the start-up and shutdown of conventional equipment and load allocation, they are difficult to fully reflect the real operating boundaries of the enterprise's energy system under different operating conditions due to inconsistent sampling rhythms of different nodes, difficulty in quantifying manual takeover behavior, and complex coupling between resource consumption status and instability risk. Especially when external load is limited, load fluctuations are aggravated, or critical equipment is running close to high load, an imbalance between optimization and safety is likely to occur, and it is impossible to effectively balance global energy consumption optimization and critical energy supply stability. Summary of the Invention
[0004] To address the aforementioned technical problems, this invention provides an AI-based enterprise energy consumption optimization management system and method based on data mining. Specifically, the technical solution of this invention is as follows:
[0005] AI-based enterprise energy consumption optimization management method based on data mining includes: acquiring the status time-series data of heterogeneous nodes and node control takeover event records according to a preset collection cycle;
[0006] Using a pre-set data mining model, energy consumption statistics are extracted from the time-series data of heterogeneous node status to generate physical resource consumption characteristics, and intervention probability indicators are generated by combining node control takeover event records.
[0007] The intervention probability index and physical resource consumption characteristics are input into the resource depletion failure boundary model pre-trained based on historical failure samples to predict the resource depletion probability, and the resource depletion probability is used as the risk contribution rate.
[0008] In response to the risk contribution rate being lower than the preset danger threshold, based on the physical resource consumption characteristics and with the optimization objective of minimizing the global comprehensive energy consumption index, the first resource allocation strategy is calculated through a multi-agent deep reinforcement learning model.
[0009] In response to a risk contribution rate that is higher than or equal to a preset danger threshold, a second resource allocation strategy is calculated by a nonlinear model prediction control model based on the characteristics of physical resource consumption.
[0010] The first or second resource allocation strategy is output as the target control command. The target control command is sent to the corresponding heterogeneous nodes to perform start-up, shutdown and load scheduling. The node status data reported by the heterogeneous nodes in real time after the target control command is sent is obtained as node feedback data. The reward value is calculated based on the node feedback data, and the parameters of the multi-agent deep reinforcement learning model are updated.
[0011] Preferably, a preset data mining model is used to extract energy consumption statistics from the time-series data of heterogeneous node states to generate physical resource consumption characteristics, and combined with node control takeover event records to generate an intervention probability index. This includes: extracting heterogeneous node state time-series data within a preset time window before the occurrence of node control takeover events as deviation samples; using a preset isolated forest algorithm to calculate the anomaly of the deviation samples and generate a strategy deviation score; inputting the strategy deviation score into a logistic regression model pre-trained with historical takeover event labeled data, and outputting an intervention probability index with a value range of 0 to 1.
[0012] Preferably, based on the physical resource consumption characteristics, with the optimization objective of minimizing the global comprehensive energy consumption index, the first resource allocation strategy is calculated through a multi-agent deep reinforcement learning model, including: mapping the physical resource consumption characteristics to the environmental state space of the multi-agent deep reinforcement learning model;
[0013] Obtain a pre-configured preset energy conversion coefficient representing the difference in external constraints of energy call at different time periods, and calculate the global comprehensive energy consumption index based on the weighted summation relationship between physical resource consumption characteristics and preset energy conversion coefficients. Construct a reward function with the optimization objective of minimizing the global comprehensive energy consumption index, and use the reward function to search the action space of the environmental state space to generate candidate action sequences. Output the action with the largest reward value in the candidate action sequence as the first resource allocation strategy.
[0014] Preferably, based on the physical resource consumption characteristics, a second resource allocation strategy is calculated using a nonlinear model predictive control model, including: inputting the physical resource consumption characteristics as the initial state into the nonlinear model predictive control model, constructing a nonlinear state evolution equation with the resource consumption amount represented by the physical resource consumption characteristics as the state variable; obtaining preset resource redundancy constraints; performing rolling optimization on the nonlinear state evolution equation under the resource redundancy constraints to generate a control sequence; and outputting the current control quantity of the control sequence as the second resource allocation strategy.
[0015] Preferably, after outputting the second resource allocation strategy as the target control instruction, the method further includes: acquiring the heterogeneous node status time series data for the next acquisition cycle and recalculating the risk contribution rate; in response to the recalculated risk contribution rate falling below the preset danger threshold, acquiring the historical state trajectory data output by the nonlinear model predictive control model.
[0016] Historical state trajectory data is used as the initial state observations of the multi-agent deep reinforcement learning model, and the first resource allocation strategy is switched as the target control command. In response to the recalculated risk contribution rate being higher than or equal to the preset danger threshold, the second resource allocation strategy is maintained as the target control command.
[0017] Preferably, the heterogeneous node status time-series data is the power load data reported by the IoT devices on the enterprise production line; the node control takeover event record is the instruction log of the operation and maintenance personnel manually cutting off the automatic control mode; the first resource allocation strategy is the equipment start-up and shutdown scheduling instruction with the lowest global comprehensive energy consumption index; and the second resource allocation strategy is the redundant power supply scheduling instruction to maintain power stability.
[0018] The AI-based enterprise energy consumption optimization management system based on data mining includes a data acquisition module, a feature mining module, a risk quantification module, an adaptive decision-making module, and a feedback execution module.
[0019] The data acquisition module is used to acquire time-series data of heterogeneous node status and records of node control takeover events;
[0020] The feature mining module is used to extract energy consumption statistics from heterogeneous node state time series data using a preset data mining model to generate physical resource consumption features, and combine them with node control takeover event records to generate intervention probability indicators.
[0021] The risk quantification module is used to input the intervention probability index and physical resource consumption characteristics into a resource depletion failure boundary model pre-trained based on historical failure samples, predict the resource depletion probability, and use the resource depletion probability as the risk contribution rate.
[0022] The adaptive decision-making module is used to calculate the first resource allocation strategy by using a multi-agent deep reinforcement learning model when the risk contribution rate is lower than the preset danger threshold, based on the physical resource consumption characteristics and with the optimization objective of minimizing the global comprehensive energy consumption index. When the risk contribution rate is higher than or equal to the preset danger threshold, it calculates the second resource allocation strategy by using a nonlinear model prediction and control model based on the physical resource consumption characteristics.
[0023] The feedback execution module is used to output the first resource allocation strategy or the second resource allocation strategy as the target control command, obtain the node status data reported in real time by heterogeneous nodes after the target control command is issued as the node feedback data, calculate the reward value based on the node feedback data, and update the parameters of the multi-agent deep reinforcement learning model.
[0024] Compared with the prior art, the present invention has the following beneficial effects:
[0025] 1. This invention acquires heterogeneous node status time-series data and node control takeover event records, and performs timestamp alignment, anomaly marking, and low-confidence processing on data from different sampling rhythms. Combined with a data mining model, it extracts energy consumption statistics to form physical resource consumption characteristics. This can more realistically reflect the coupled resource consumption situation between chiller stations, air compressor stations, energy storage clusters, clean air conditioning, and key process loads, and improve the problem that existing fixed rules and single load threshold methods are difficult to fully characterize the real operating boundary of the park.
[0026] 2. This invention extracts deviation samples around a preset time window before a takeover event, calculates the strategy deviation score using the isolated forest algorithm, and outputs an intervention probability index through a logistic regression model trained on historical takeover labeled data. This can transform changes in maintenance acceptance that are difficult to quantify, such as manual disconnection of automatic control mode, manual forced power supply, and manual locking of equipment, into calculable inputs, thereby making up for the shortcomings of existing technologies in that manual takeover behavior is difficult to quantify and difficult to predict and identify changes in control tolerance.
[0027] 3. This invention sets a buffer zone near the risk contribution rate threshold, confirms the switch of control path after multiple consecutive sampling cycles, and recalculates the risk contribution rate in the next acquisition cycle after executing the second resource allocation strategy. After the risk falls back, it extracts the historical state trajectory data of the nonlinear model predicting the control model output as the initial state observation value of the multi-agent deep reinforcement learning model, and then switches back to the first resource allocation strategy. This can achieve orderly back-switching from the optimization mode to the conservative mode and back to the optimization mode, avoiding state switching exceeding the preset number of times near the threshold and strategy mismatch at the moment of back-switching.
[0028] 4. This invention collects real-time feedback data from heterogeneous nodes after the target control command is issued, calculates reward values based on the results such as total incoming line fluctuations, critical clean area temperature, energy storage recovery, and whether manual intervention has been added, and performs online updates or bypass correction based on high-risk samples for the multi-agent deep reinforcement learning model. This enables the model to continuously learn which states should actively tend towards conservatism and which action combinations can reduce ineffective energy fluctuations without triggering manual intervention, thereby improving the adaptability of subsequent scheduling.
[0029] 5. This invention addresses scenarios such as missing data, communication delays, incorrect timestamps, missing takeover records, manual equipment locking, maximum start / stop times, and minimum startup duration by implementing mechanisms for data completion, weight reduction, conservative estimation, and constraint filtering. It also constructs a deployable system architecture comprising a data acquisition module, a feature mining module, a risk quantification module, an adaptive decision-making module, and a feedback execution module. This enhances the stability, rollback capability, and traceability of industrial field operations, forming a closed-loop control chain of data acquisition, mining, risk quantification, dual-path decision-making, and feedback updates. This solves the problem of existing technologies struggling to balance global energy consumption optimization with critical power supply stability. Attached Figure Description
[0030] The present invention will be further explained below with reference to the accompanying drawings and embodiments:
[0031] Figure 1 A flowchart illustrating the AI-based enterprise energy consumption optimization management method based on data mining provided in this application embodiment;
[0032] Figure 2 This is a schematic diagram of the modules of the AI-based enterprise energy consumption optimization management system based on data mining in the embodiments of this application. Detailed Implementation
[0033] To make the objectives, technical solutions, and advantages of this invention clearer, the invention will be further described in detail below with reference to specific embodiments.
[0034] The AI-based enterprise energy consumption optimization management system and method based on data mining includes: acquiring the status time-series data of heterogeneous nodes and node control takeover event records according to a preset collection cycle;
[0035] Using a pre-set data mining model, energy consumption statistics are extracted from the time-series data of heterogeneous node states to generate physical resource consumption characteristics, and intervention probability indicators are generated by combining node control takeover event records.
[0036] The intervention probability index and physical resource consumption characteristics are input into the resource depletion failure boundary model pre-trained based on historical failure samples to predict the resource depletion probability, and the resource depletion probability is used as the risk contribution rate.
[0037] In response to the risk contribution rate being lower than the preset danger threshold, based on the physical resource consumption characteristics and with the optimization objective of minimizing the global comprehensive energy consumption index, the first resource allocation strategy is calculated through a multi-agent deep reinforcement learning model.
[0038] In response to a risk contribution rate that is higher than or equal to a preset danger threshold, a second resource allocation strategy is calculated by a nonlinear model prediction control model based on the characteristics of physical resource consumption.
[0039] The first or second resource allocation strategy is output as the target control command. The target control command is sent to the corresponding heterogeneous nodes to perform start-up, shutdown and load scheduling. The node status data reported by the heterogeneous nodes in real time after the target control command is sent is obtained as node feedback data. The reward value is calculated based on the node feedback data, and the parameters of the multi-agent deep reinforcement learning model are updated.
[0040] This embodiment provides an AI-based enterprise energy consumption optimization management mechanism based on data mining, such as... Figure 1 As shown; specifically, this embodiment takes a high-end wafer manufacturing park as a continuous main scenario. The park simultaneously has exposure units, clean air conditioning units, refrigeration stations, air compressor stations, circulating water pump stations, energy storage battery clusters, and distributed rooftop photovoltaics. All equipment is uniformly coordinated by the energy dispatch center. In this scenario, the park's normal standby capacity is compressed to a preset lower limit capacity range to reduce the idle rate of energy equipment. Therefore, the dispatch system must not only pay attention to total energy consumption and carbon emissions, but also continuously prevent power supply and temperature control instability in key process lines.
[0041] This embodiment focuses on energy consumption intensity, equipment load distribution, power supply continuity, and operational safety boundaries in engineering implementation. Minimizing the overall global energy consumption index is the control objective described in this embodiment. In the corresponding implementation level of the specification, this can be specifically understood as: under the premise of satisfying the constraints of continuous operation of key processes, power supply stability, temperature control stability, and resource redundancy, keeping the overall energy consumption, ineffective power fluctuations, and unnecessary resource occupation at a low level within a unit control cycle. Its core remains a resource allocation optimization problem in industrial control, rather than a business analysis problem.
[0042] The specific processing procedure is as follows: the time-series data of heterogeneous node status can come from the continuous reporting information of different types of equipment, such as the instantaneous power of the chiller, the start-stop status of the air compressor, the state of charge of the energy storage cluster, the air supply load of the cleanroom air conditioning, the production scheduling intensity of the process section, the output power of the photovoltaic inverter, and the time-of-use load data of the main incoming line meter. Since these nodes have different sampling rhythms, the system can first align the timestamps to form a time-series segment under a unified sampling period.
[0043] The node control takeover event logs come from the operation and maintenance platform and the workshop control room, reflecting operation logs such as manual disconnection of automatic scheduling, manual forced power supply, and manual locking of the operation mode of a certain equipment. In engineering, these logs represent the actual behavior of automatic control being manually stopped, and therefore can serve as an important observation basis for subsequent identification of changes in control tolerance.
[0044] Furthermore, the data mining model extracts energy consumption statistics from the aforementioned time-series data to form physical resource consumption characteristics. These physical resource consumption characteristics are not abstract parameter stacks, but rather a set of resource status characteristics that correspond one-to-one with the actual engineering conditions. For example, a certain cold station exhibits a continuous high load in the past three collection cycles, indicating that its heat exchange redundancy is decreasing; a certain energy storage cluster fails to fully recharge during low-load periods, indicating that the available electrical energy for buffering during subsequent disturbance periods is decreasing and below the preset capacity threshold; a certain air compressor station frequently starts and stops, indicating increased equipment wear and instantaneous inrush current, leading to decreased subsequent stability.
[0045] The system then combines these physical resource consumption characteristics with takeover event records to generate an intervention probability index. This index reflects the possibility that the current automatic scheduling strategy will trigger manual takeover on the operation and maintenance side. From an engineering perspective, its essence is the prior quantification of manual judgment of scheduling anomalies and switching to conservative mode.
[0046] Based on this, the system inputs the intervention probability index and physical resource consumption characteristics into the resource depletion failure boundary model. The model is trained from historical failure samples, which may include situations such as the energy storage device's state of charge being lower than the preset discharge limit, power outage of critical loads, insufficient cooling causing the temperature of the clean area to exceed the limit, and instantaneous power surge after manual global switching.
[0047] Each historical failure sample contains feature input and label. The feature input is the physical resource consumption feature sequence before the occurrence of the historical failure and the associated intervention probability index. The label is a binary classification label indicating whether operational instability or resource depletion occurred at the corresponding time. Specifically, the resource depletion failure boundary model is a multilayer perceptron classification model or a gradient boosting tree classification model. The training sample labeling method is as follows: historical physical feature sequences that have experienced operational instability or resource depletion are classified as positive samples and their label values are assigned to 1. Historical physical feature sequences that have been running smoothly without any abnormalities are classified as negative samples and their label values are assigned to 0.
[0048] The resource depletion failure boundary model outputs the resource depletion probability and uses it as the risk contribution rate. Here, the risk contribution rate represents how close the current energy state is to the unacceptable operating boundary of the park. If the risk contribution rate is lower than the preset danger threshold, it means that the system is still in the range that can be actively optimized and it is suitable to continue to use the multi-agent deep reinforcement learning model with resource allocation optimization as the main focus.
[0049] If the risk contribution rate is higher than or equal to the preset danger threshold, it indicates that the park is approaching the instability boundary. At this time, the nonlinear model predictive control path should be switched to maintain the continuity of energy supply by increasing the resource redundancy constraint setting value and reducing the output amplitude of the control command. The preset danger threshold is an empirical constant extracted from the statistical data of the instability boundary of the park's historical energy system, or a percentage critical value dynamically calculated based on the minimum energy redundancy of the current key process equipment in the park.
[0050] In the reinforcement learning path, each agent can be divided according to equipment clusters or regions. For example, the chiller agent is responsible for the combination of cooling capacity and host start-up and shutdown, the energy storage agent is responsible for the charging and discharging sequence, the air compressor agent is responsible for load switching, and the clean air conditioning agent is responsible for the coupling and regulation of fresh air and return air. Through collaborative search, the system provides the first resource allocation strategy, which in engineering terms is usually an equipment scheduling instruction that makes the overall energy distribution of the entire park more balanced while ensuring the continuous supply of the process line.
[0051] In the predictive control path, the system uses the current resource state as the initial condition and performs rolling solutions for several future control cycles to obtain the second resource allocation strategy. Its engineering performance is usually a conservative control scheme that prioritizes key processes, suppresses power spikes, and retains redundancy. The system sends the first or second resource allocation strategy as the target control command to the equipment control layer and collects node feedback data after the command is executed.
[0052] The reward value is calculated based on node feedback data. This value is used to reflect the effects of the current control in reducing ineffective energy fluctuations, maintaining power and temperature stability, and avoiding triggering manual intervention. The result is then fed into the multi-agent deep reinforcement learning model for subsequent parameter updates.
[0053] Specifically, the reward value is calculated based on the node feedback data, including: assigning preset weights to the comprehensive energy consumption, the amplitude of invalid power fluctuations, and the number of times manual intervention is triggered in the node feedback data, and calculating the negative value or reciprocal of the weighted sum as the current comprehensive evaluation value, and using the comprehensive evaluation value as the reward value to guide the model to converge towards low energy consumption, low fluctuations, and low human intervention.
[0054] To further clarify, when the target control instruction comes from the first resource allocation strategy, the reward value can be directly used as feedback for the result of the reinforcement learning action; when the target control instruction comes from the second resource allocation strategy, the reward value does not indicate that the reinforcement learning model dominates the execution control in this cycle, but rather writes the state of this cycle, the conservative control result, and the subsequent feedback as high-risk samples into the experience cache or sample queue, which is used to constrain the reinforcement learning model to reduce aggressive behavior preferences in similar high-risk states in the future.
[0055] In other words, parameter updates under predictive control paths involve using actual operational feedback to perform offline or bypass corrections on the reinforcement learning model. The goal of these updates is to enable the model to learn to proactively adopt conservative control strategies under specific conditions, rather than equating the predictive controller itself with the output source of reinforcement learning actions, thereby avoiding confusion in control logic.
[0056] Furthermore, if some heterogeneous nodes experience data loss, communication delays, or timestamp errors during the data collection period, the system can first mark the node as a low-reliability node and temporarily supplement it using the most recent stable period data, the average of similar equipment groups, or the process scheduling baseline. However, the supplemented node is not directly used as a high-weight decision basis to avoid incorrect scheduling due to abnormal data. If the control takeover event record is missing, the system can temporarily estimate the risk contribution rate based solely on the physical resource consumption characteristics and place the intervention probability index in a conservative range to make the scheduling strategy more stable.
[0057] If the risk contribution rate given by the resource depletion failure boundary model is close to the danger threshold, the system can set a buffer zone. After the risk contribution rate shows the same trend in multiple consecutive sampling periods, the control path will be switched to avoid frequent switching near the threshold. The buffer zone is a numerical range formed by extending a preset fluctuation ratio above and below the preset danger threshold. When the risk contribution rate falls into this numerical range, the system will not trigger the switching of the control path until the risk contribution rate crosses the upper and lower limits of the numerical range in one direction within a set number of consecutive sampling periods, at which point the corresponding control path switching will be executed.
[0058] On a hot summer afternoon, the wafer campus received a temporary peak-shaving instruction from the power grid, and the total incoming load constraint tightened in a short period of time. At this time, the photovoltaic output fluctuated significantly due to cloud cover, the energy storage cluster's charge status had dropped to a low range, and the refrigeration station was close to full load due to the increased heat load of the cleanroom. The energy dispatch center obtained status time-series data from the chiller, air compressor, energy storage cabinet, cleanroom air conditioner, and process line metering, and at the same time read the manual locking logs of the two auxiliary air conditioners by the maintenance personnel in the past ten minutes.
[0059] Data mining revealed that the park had implemented flexible load balancing with a switching frequency exceeding the preset threshold in order to reduce the peak total incoming lines in the previous period, which caused some maintenance personnel to have an unstable impression and increased the probability of intervention.
[0060] Based on this, the resource depletion failure boundary model judged that the park was close to the risk boundary. Therefore, the system no longer issued aggressive optimization strategies, but instead gave conservative resource allocation schemes through nonlinear model predictive control. For example, it prioritized the exposure process, froze some non-critical load switching actions, and required energy storage to be used only to suppress peaks and not to continue deep discharge.
[0061] After execution, the system continues to read node feedback. If the total incoming line fluctuation decreases, the temperature of the critical clean area returns to stability, and the number of manual connections stops increasing, the results of this stage will be reflected as a higher stability reward, which will be used to adjust the reinforcement learning model's strategy preference for high-risk periods in the future.
[0062] The purpose of this step is to build a complete closed-loop control link of data collection, mining, risk quantification, dual-path decision-making, and feedback update, so that the system has resource optimization capabilities under normal operating conditions and rapid degradation capabilities when approaching the instability boundary, thereby achieving a coordinated balance between the operation scheduling objectives and operation safety objectives in the enterprise's energy system.
[0063] Here, we would like to further clarify the correspondence of terms throughout the text: The resource depletion probability output by the resource depletion failure boundary model is used as the risk contribution rate after entering the control decision stage. Numerically, the two are the same model output, and we only emphasize their failure prediction attribute and decision evaluation attribute, respectively, without corresponding to other probability variables; The terms "near the danger threshold" and "near the risk boundary" mentioned in the text are all described around the same preset danger threshold and its adjacent decision interval, and do not constitute new threshold names.
[0064] The conservative control scheme, conservative control result, and redundant power supply scheduling instructions mentioned in the text are all manifestations of the second resource allocation strategy at the engineering execution level; the optimized actions of the equipment scheduling instructions mentioned are all manifestations of the first resource allocation strategy at the engineering execution level. The aforementioned statements do not change the existing definitions of the two types of resource allocation strategies.
[0065] Furthermore, by using a pre-set data mining model, energy consumption statistics are extracted from the time series data of heterogeneous node states to generate physical resource consumption characteristics, and intervention probability indicators are generated by combining node control takeover event records, including: extracting heterogeneous node state time series data within a pre-set time window before the occurrence of node control takeover event records as deviation samples;
[0066] The pre-defined isolated forest algorithm is used to calculate the anomaly of the deviation samples and generate a policy deviation score.
[0067] The strategy deviation score is input into a logistic regression model that has been pre-trained using historical takeover event labeled data, and the output is an intervention probability index with a value range of 0 to 1.
[0068] This embodiment provides a mechanism for generating intervention probability indicators. Specifically, in the aforementioned wafer manufacturing park scenario, while equipment energy consumption characteristics can reflect whether physical resources are strained, they are insufficient to explain the reasons for human intervention triggered by specific automatic scheduling. This is because whether maintenance personnel take over often depends not only on whether the equipment itself is under high load, but also on whether the scheduling actions exhibit characteristics that significantly deviate from empirical rules, such as frequent start-stop, cross-regional load balancing, and short-term continuous adjustments. Therefore, this embodiment further introduces anomaly identification of the time sequence segment before takeover to quantify the degree of deviation of the scheduling strategy in maintenance perception.
[0069] The specific processing procedure is as follows: the system extracts a preset time window as a deviation sample for each node control takeover event; the duration of this time window must be limited to a preset time span, but should be sufficient to cover the main scheduling behaviors before the takeover; for example, a multi-node status segment within 5 to 15 minutes before the takeover can be extracted, which includes information such as total load changes, equipment start-up and shutdown frequency, number of energy storage charging and discharging direction switching times, cooling capacity adjustment range, and process area load rotation rhythm; this information physically corresponds to whether the system has performed abnormal energy-saving actions;
[0070] The Isolation Forest algorithm is used to calculate the outlier of deviation samples. In this scenario, the algorithm's role is not simply to find outliers, but to identify state combinations with low historical frequency that are likely to trigger maintenance intervention. A specific application example illustrates the data flow: Assume that before the takeover, three sample segments are formed within the window. The first sample segment shows stable operation of the chiller, slow discharge of energy storage, and a smooth decrease in total load. The second sample segment shows two short-term switching of the auxiliary air conditioner. The third sample segment shows the rapid switching of energy storage from discharge to charging and the simultaneous unloading of the air compressor station.
[0071] For historical samples, the first sample segment is close to normal operating conditions, the second sample segment deviates to some extent, and the third sample segment appears less frequently. After processing with an isolated forest, anomaly results with different relative magnitudes can be generated. For example, the first sample segment corresponds to a low anomaly, the second sample segment is in the middle, and the third sample segment is high. The system then maps the current window as a whole to a policy deviation score. In engineering terms, this score represents the deviation between the current automatic scheduling and past stable operating experience.
[0072] After obtaining the strategy deviation score, the system feeds it into a logistic regression model trained based on historical takeover event labeled data. The labels in the training samples can be provided by real operation and maintenance records, such as whether manual locking of equipment occurred within 10 minutes after the scheduling, whether automatic control mode was cut off, or whether manual power supply was forcibly switched to the manual power-saving scheme. The logistic regression model outputs an intervention probability index between 0 and 1. The higher the index, the more likely the current scheduling state is to trigger manual takeover in the future. In practical use, this index is not equivalent to the failure probability, but reflects the operation and maintenance tolerance of automatic scheduling, and can therefore be regarded as a quantitative expression of control tolerance.
[0073] As a further anomaly handling mechanism, if no clear control takeover event is detected within a certain time window, the system can still extract samples according to a fixed sliding window to estimate the probability of potential intervention online. The results generated at this time can be used as early warning values, rather than immediately as a basis for strong switching. If there are insufficient historical takeover labeled samples, the logistic regression model can be initialized with weak labels generated by operation and maintenance rules, and then gradually replaced after the system accumulates enough real events.
[0074] If there is a lot of missing data in the key nodes of the current window, such as communication interruption of the energy storage cabinet or abnormal status feedback of the air compressor station, the system can reduce the deviation confidence level of the window to avoid amplifying the prediction results of human intervention by a single distorted segment; if the anomaly degree of the isolated forest output is in the intermediate fuzzy range, the system can combine the change trends of multiple recent windows for smoothing, rather than making a judgment based solely on the result of a single window.
[0075] On the afternoon when the aforementioned park encountered a sharp tightening of external power supply load restrictions, the dispatch system continuously implemented rotational control on three types of flexible loads in order to reduce the total power load: first, reducing the air conditioning load in the office area, then switching the operating mode of some circulating pumps, and performing short-term power replenishment and discharge alternating operations on the energy storage clusters; although the maintenance personnel have not yet taken over immediately, feedback from the control room has already emerged that the system is operating too frequently.
[0076] The system uses the most recent 10-minute time series data to form deviation samples. The isolated forest identifies that the combination of frequent reversals in energy storage direction, synchronous fluctuations in cooling station load, and increased start-up and shutdown of air compressor stations is relatively rare in history, thus giving a high strategy deviation score. Based on this, the logistic regression model outputs a high intervention probability index, indicating that if the current automatic scheduling continues to advance aggressively, the probability of triggering manual intervention exceeds the preset threshold.
[0077] The purpose of this step is to transform the tendency of human intervention, which is originally difficult to observe directly, into a calculable and traceable probability indicator of intervention, thereby enabling early identification of changes in control acceptance and providing input data that is closer to actual operation and maintenance behavior for subsequent risk quantification and strategy switching.
[0078] Furthermore, based on the physical resource consumption characteristics and with the minimization of the global comprehensive energy consumption index as the optimization objective, the first resource allocation strategy is calculated through a multi-agent deep reinforcement learning model, including: mapping the physical resource consumption characteristics to the environmental state space of the multi-agent deep reinforcement learning model;
[0079] Obtain a pre-configured preset energy conversion coefficient representing the difference in external constraints of energy call at different time periods, calculate the global comprehensive energy consumption index based on the weighted summation relationship between physical resource consumption characteristics and preset energy conversion coefficients, construct a reward function with the optimization objective of minimizing the global comprehensive energy consumption index, and use the reward function to search the action space of the environmental state space to generate candidate action sequences.
[0080] The action with the highest reward value in the candidate action sequence is output as the first resource allocation strategy.
[0081] This embodiment provides a solution mechanism for a first resource allocation strategy. Specifically, in the aforementioned main park scenario, when the risk contribution rate is still below the danger threshold, it indicates that the system still has room for optimization. If an overly conservative control method is still adopted at this time, although the safety margin is high, it will reduce the energy storage regulation capacity, flexible load coordination capacity, and equipment combination optimization capacity in the long term. Therefore, this embodiment introduces a multi-agent deep reinforcement learning model in a relatively safe operating range to search for a resource allocation scheme with low global energy consumption and stable operation.
[0082] It should be further explained here that the statement in the embodiment that the optimization goal is to minimize the global comprehensive energy consumption index is specifically implemented in the control implementation of this embodiment as a joint evaluation of comprehensive energy consumption, load fluctuation intensity, equipment switching cost and resource occupation level; wherein, the preset energy conversion coefficient is used to characterize the difference in external constraints of energy call in different time periods, but at the specification level it is emphasized that it participates in action optimization as a scheduling weight parameter rather than as operational settlement data;
[0083] The specific processing procedure is as follows: The system first maps the physical resource consumption characteristics into an environmental state space. This state space can be understood as a structured expression of the energy status of the park, rather than a simple data patchwork. For example, the chiller station state subspace reflects the current chiller load rate, inlet and outlet water temperature difference, and available redundancy; the energy storage state subspace reflects the state of charge, the allowable charging and discharging range, and the expected subsequent recharging opportunities; the process area state subspace reflects the key process load level, the proportion of flexibly adjustable load, and the proportion of uninterruptible equipment; the electricity price state subspace reflects the current time-of-use energy weight level and the future short-term scheduling pressure trend; multiple intelligent agents perceive the states related to their functions and share necessary global constraint information.
[0084] In terms of action space, each intelligent agent can output control actions for its own region or equipment cluster; for example, the energy storage intelligent agent can choose from maintaining, slow charging, slow discharging, and limiting deep discharging; the chiller station intelligent agent can choose from maintaining the current unit combination, switching to the high-efficiency main unit, reducing auxiliary load, and delaying non-critical cooling supply; the air compressor station intelligent agent can choose from stabilizing air supply, switching to standby units, and reducing non-critical air load; the system searches for these action combinations and generates candidate action sequences;
[0085] The generation method of candidate action sequences can be illustrated by a simplified example: Assuming that the current cycle is coordinated by the first agent, the second agent, and the third agent, the first candidate sequence is: the first agent maintains, the second agent slows down, and the third agent reduces non-critical loads; the second candidate sequence is: the first agent switches to the high-efficiency host, the second agent slows down, and the third agent stabilizes gas supply.
[0086] The third candidate sequence is: the first agent reduces the auxiliary load, the second agent maintains, and the third agent switches to the standby machine. The system evaluates these sequences based on the goal of minimizing the global comprehensive energy consumption index. The evaluation reflects not only the immediate energy dispatch level, but also the comprehensive resource consumption level caused by equipment efficiency, energy storage utilization, and load migration.
[0087] To further clarify, the candidate action sequence is used to characterize the collaborative control concept at the current moment and several steps. In the implementation example, outputting the action with the largest reward value in the candidate action sequence is usually manifested in the following way during engineering implementation: first, select the candidate action sequence with the largest comprehensive reward value, and then output the first joint action of the sequence in the current control cycle as the first resource allocation strategy. Subsequent actions are only used as reference backgrounds for searching again in the next cycle. This approach not only preserves the forward-looking nature of sequence search for changes in resource consumption over multiple steps, but also maintains consistency with the execution method of only issuing the current action in the current cycle in real-time scheduling.
[0088] From a physical mechanism perspective, this type of method is suitable for periods with lower risks because the park still has the capacity to conduct a certain amount of exploration and optimization. The multi-agent approach can enable different subsystems to work together under a unified goal. For example, during periods of high load, the cooling station can appropriately reduce the cooling supply to non-critical areas, while energy storage can simultaneously undertake peak shaving. The air compressor station can reduce the unit gas production energy consumption by optimizing the unit combination, thereby achieving an overall improvement in resource utilization, rather than a suboptimal result caused by the independent control of local equipment lacking coordination.
[0089] Furthermore, if a device cluster corresponding to a certain agent is in a manually locked state, then that agent will not participate in action search in the current cycle, and its action will be fixed to the manually limited value; other agents will continue to optimize under constraints; if some devices have an upper limit on the number of start-stops, a minimum power-on time, or process quality constraints, then these constraints will be included in the screening during the candidate action generation stage to avoid generating control sequences that are superior in terms of indicator evaluation but are not executable; if the current external scheduling weight prediction is unstable or there is an unclear load limit notification, then the system can reduce the proportion of future weight factors in the reward, so that the first resource allocation strategy pays more attention to the currently verifiable information and avoids reverse operation due to inaccurate future judgments;
[0090] During the morning hours when the park's load is stable, energy storage still has sufficient capacity, photovoltaic output is good, and the probability of manual intervention is low, the system judges that the risk contribution rate is below the danger threshold. At this time, the multi-agent deep reinforcement learning model maps the subsystems such as the chiller station, energy storage, air compressor station, and clean air conditioning into a collaborative environment. For the same period, the system generates multiple sets of candidate actions. For example, one set of schemes focuses on energy storage peak shaving, another set of schemes focuses on switching high-efficiency chillers, and the third set of schemes simultaneously reduces the air conditioning load in the auxiliary area.
[0091] After comprehensive evaluation, the system determines the candidate action sequence with the highest comprehensive reward, and selects the joint action corresponding to the current control cycle as the first resource allocation strategy for the current period. For example, maintaining the stability of the critical process environment, reducing part of the load in the office and storage areas, slowing down the release of energy storage to cover peaks, and switching the chiller station to high-efficiency units, and then sending it to the control layer.
[0092] The purpose of this step is to fully unleash the synergistic optimization potential among various energy devices under controllable risk conditions, thereby achieving a lower level of global resource consumption, while providing actual operating samples for subsequent models to continuously learn optimal scheduling under different load and scheduling weight scenarios.
[0093] Furthermore, based on the physical resource consumption characteristics, a second resource allocation strategy is calculated through a nonlinear model predictive control model, including: inputting the physical resource consumption characteristics as the initial state into the nonlinear model predictive control model to construct a nonlinear state evolution equation with the resource consumption represented by the physical resource consumption characteristics as the state variable; obtaining preset resource redundancy constraints; performing rolling optimization to solve the nonlinear state evolution equation under the resource redundancy constraints to generate a control sequence; and outputting the current control quantity of the control sequence as the second resource allocation strategy.
[0094] This embodiment provides a solution mechanism for a second resource allocation strategy. Specifically, in the aforementioned main scenario of the industrial park, although the low-resource-consumption scheme obtained solely through reinforcement learning is suitable for conventional optimization, when approaching the risk boundary, the system needs a control mode with interpretable constraints, smooth actions, and clear protection against instability boundaries. Especially when energy storage capacity is low, external power rationing is imminent, the heat load of key processes is increasing, or the tendency for manual intervention is strengthening, if the first resource allocation strategy, which prioritizes energy consumption optimization and focuses on improving energy efficiency, is still adopted, it is easy to cause the park's energy supply index to fall below the preset lower limit in a short period of time. Therefore, this embodiment introduces a nonlinear model predictive control model to solve the second resource allocation strategy.
[0095] The specific processing procedure is as follows: the system takes the physical resource consumption characteristics as the initial state input model; the state variables here are not abstract mathematical quantities, but engineering states that directly correspond to the evolution of park resources, such as the available power of energy storage, the load ratio of the main power supply line, the remaining adjustment margin of the chiller station, the stable gas supply capacity of the air compressor station, and the minimum ventilation and cooling requirements required to maintain the clean environment.
[0096] The reason for adopting nonlinear state evolution is that enterprise energy systems inherently possess strong coupling relationships. For example, changes in chiller load alter the total incoming power, energy storage charging and discharging affect subsequent redundancy, and changes in cleanroom air conditioning supply feed back into cooling demand; the system is not a simple linear superposition. The nonlinear state evolution equation specifically constructs dynamic coupling constraints between chiller load factor, energy storage charge state, and total incoming power using polynomials or exponential functions. The equation also includes nonlinear terms characterizing discrete state transitions during equipment start-up and shutdown, as well as energy conversion efficiency degradation. Specifically, the nonlinear state evolution equation takes the following form: ,in, State variables representing the characteristics of physical resource consumption. For the current control input, and For the state and control coefficient matrix, For polynomial nonlinear terms, The term is an exponential nonlinear term; the nonlinear predictive control model is to minimize the objective function by rolling the solution over the prediction time domain. Thus, a mathematical model for the second resource allocation strategy is derived;
[0097] Preset resource redundancy constraints are used to limit the minimum safety margin; for example, the power supply circuit of the critical process line must retain a certain power margin, the energy storage cluster must not be lower than the predetermined minimum load range, the chiller station must retain at least the backup host capacity that can be quickly switched in, and the air compressor system must maintain the minimum pressure stability range; the system performs rolling optimization for several future control cycles under these constraints.
[0098] The significance of rolling optimization is that after each current control variable is executed, the system immediately recalibrates the subsequent control sequence based on the newly acquired actual state, thereby coping with external disturbances and model errors. Ultimately, the system only outputs the control variable at the current moment as the second resource allocation strategy, and the remaining future control variables are used as references for the next cycle.
[0099] Specific application examples can be used to illustrate the generation method of the control sequence; assuming that in the next three consecutive control cycles, the system detects at the moment of the first control cycle that the energy storage margin is lower than the preset safety margin threshold, the chiller is approaching full load, and the total incoming line shows a peak trend, then the first set of smooth control sequences is generated: limiting non-critical loads in the first control cycle, maintaining energy storage without deep release in the second control cycle, and reserving backup units for switching in the third control cycle; the second set of control sequences is then generated:
[0100] The system immediately releases a large amount of energy storage in the first control cycle, reduces cooling stations in the second control cycle, and returns to normal in the third control cycle. However, the second set of control sequences, although it reduces total electricity purchase in the short term, will quickly deplete energy storage and increase the difficulty of subsequent recovery, so it is not adopted under redundancy constraints. The system finally outputs the current control quantity of the first set of smooth control sequences in the first control cycle, such as limiting some non-critical temperature control loads, freezing some flexible load rotations, and maintaining energy storage operation above the minimum range.
[0101] As a further anomaly handling mechanism, if key redundancy constraint information is temporarily missing, such as if the effective capacity estimate of a certain energy storage cluster is distorted, the system can automatically raise the conservative boundary, regard the energy storage cluster as not being able to be deeply called, and prioritize external power purchase and auxiliary load reduction to maintain stability.
[0102] If no feasible control sequence is found in the rolling prediction of the model, that is, no scheme can simultaneously meet the critical load and redundancy constraints, the system can trigger a higher level of emergency control logic, such as directly executing the predefined power protection list, classifying and cutting off non-critical loads, and sending a manual confirmation prompt to the operation and maintenance terminal; if the risk contribution rate has far exceeded the danger threshold, the system can shorten the rolling prediction cycle, making the control more biased towards rapid steady-state maintenance rather than long line-of-sight energy saving.
[0103] On the aforementioned afternoon of high temperatures, the park had received a short-term peak-shifting instruction, and the energy storage charge status dropped significantly. The exposure process line was also in the continuous wafer-setting stage and could not accept power supply fluctuations. The system judged that continuing to implement the aggressive energy-saving strategy would damage the power supply stability of the critical process, so it switched to a nonlinear model predictive control path. The model uses the current chiller load, energy storage margin, power demand of the critical process, and load that can be cut off in the auxiliary area as the initial state. Under the premise of ensuring redundancy of the critical circuit, energy storage backup, and cooling continuity, it generates a control sequence characterized by stabilizing the critical process first, then compressing non-critical loads, and avoiding frequent switching in a short period of time. The system only issues control quantities for the current moment, such as suspending the optimization of some air conditioning in the office area, restricting the continued deep release of energy storage, maintaining the stable operation of the two main chillers and temporarily suspending their rotation.
[0104] The purpose of this step is to maintain the continuity of the company’s critical energy supply in a way that is well-defined, explainable and has low volatility when approaching the instability boundary, thereby proactively avoiding the resource depletion boundary.
[0105] It should be further explained here that the descriptions of redundancy constraints, critical loop redundancy, energy storage backup, and standby machine switching capability in this embodiment are all elaborations on the resource redundancy constraints in the embodiment. They are used to describe the specific implementation of the constraints on different energy objects, rather than introducing new independent constraint categories.
[0106] Accordingly, the smooth control sequence and conservative control quantity in the paper are both expressions of the solution results of the second resource allocation strategy in the rolling optimization process, and the final output is still the control quantity of the second resource allocation strategy at the current moment.
[0107] Furthermore, after outputting the second resource allocation strategy as the target control instruction, the method further includes: acquiring the heterogeneous node state time-series data for the next acquisition cycle and recalculating the risk contribution rate; in response to the recalculated risk contribution rate falling below a preset danger threshold, acquiring the historical state trajectory data output by the nonlinear model predictive control model; using the historical state trajectory data as the initial state observation value of the multi-agent deep reinforcement learning model and switching the output of the first resource allocation strategy as the target control instruction; and in response to the recalculated risk contribution rate being higher than or equal to the preset danger threshold, maintaining the output of the second resource allocation strategy as the target control instruction.
[0108] This embodiment provides a control path revert mechanism. Specifically, in the aforementioned mainline scenario of the park, after the system switches from the reinforcement learning path to the predictive control path, if it remains in the conservative mode, although it can reduce short-term risks, it will cause a decline in long-term optimization capabilities. For example, the energy storage peak-valley load regulation capability will decrease, the flexible space of non-critical loads will be idle, and the comprehensive energy consumption index will rise. Therefore, it is not enough to only downgrade when the risk increases; it is also necessary to have the ability to smoothly recover to the optimization mode after the risk is eliminated. This embodiment is used to solve this recovery problem.
[0109] The specific processing procedure is as follows: After executing the second resource allocation strategy, the system continues to acquire the time-series data of the heterogeneous node status in the next acquisition cycle and recalculates the risk contribution rate. If the risk contribution rate falls below the danger threshold, it indicates that the previous conservative control has moved the system away from the failure boundary. For example, the peak of the total incoming line is flattened, the energy storage backup range is restored, the cold station redundancy is increased, and the trend of manual takeover is weakened. At this time, the system does not directly allow the reinforcement learning model to perceive the current state from zero, but acquires the historical state trajectory data of the nonlinear model predictive control model output and uses it as the initial state observation value of the reinforcement learning model.
[0110] The engineering implications of this approach are that reinforcement learning models can identify when the system is recovering from a high-risk, controlled process, rather than misjudging the current state as a normal, stable condition, thus reducing policy mismatch at the moment of backtracking. The initial state observations referred to here are preferably understood as encoding the historical state trajectory as the reinforcement learning model's contextual input, state fragments, or short-term memory initialization information at the time of backtracking, used to supplement the current observation background, rather than overwriting or resetting the existing training parameters of the reinforcement learning model. Therefore, the backtracking action maintains enhanced policy continuity, rather than completely rewriting the model's retraining starting point.
[0111] Historical state trajectory data can include key resource change trajectories over several consecutive periods, such as the process of energy storage recovering from a low level to above the minimum range, the recovery process of key process loads after the peak-shaving order is lifted, the process of cooling plants returning from the edge of full load to the stable and efficient zone, and the process of auxiliary loads recovering from a reduced state. Through these trajectories, reinforcement learning models can obtain a more complete context and identify loads that have recently experienced forced suppression, resources that have not yet fully recovered, and equipment that is still not suitable for frequent adjustments.
[0112] A specific application example can be used to illustrate the back-cut process; assuming that the predictive control stage forms four continuous state segments, representing risk increase, implementation of conservative control, peak fading and system stabilization respectively; if the system finds that the risk contribution rate has dropped below the danger threshold at the system stabilization time, the state trajectory from the implementation of conservative control to system stabilization is input into the multi-agent deep reinforcement learning model as its new initial observation background.
[0113] Subsequently, the reinforcement learning model does not directly select aggressive, high-efficiency actions, but instead prioritizes searching from stable recovery candidate actions. For example, it first restores part of the auxiliary load, maintains slow charging of energy storage, and postpones the concentrated start-up of high-energy-consuming equipment. After several cycles of stabilization, it gradually returns to the normal optimization rhythm.
[0114] As a further anomaly handling mechanism, if the recalculated risk contribution rate, although briefly declining, is unstable in continuity (e.g., falling below the threshold for one period and rising again in the next), the system can set a back-cut confirmation condition, requiring multiple consecutive periods to be below the threshold before a back-cut can be executed. If there are breakpoints in the historical state trajectory data, such as communication interruptions at some nodes during predictive control, the reinforcement learning model can be initialized using effective trajectory segments and current real-time observations. If there are too many breakpoints, maintaining the predictive control path is more prudent. If the tendency for manual takeover increases again shortly after a back-cut, the system can re-enter predictive control mode and record the failed back-cut samples for subsequent optimization of the back-cut criteria.
[0115] On the aforementioned afternoon of high temperature, the park had previously entered predictive control mode due to the staggered peak instruction and the risk of power outages under high load. After several data collection cycles, the grid restrictions eased, and the energy storage recovered to above the guaranteed range through off-peak replenishment. The temperature of the key clean area returned to stability, and the operation and maintenance personnel did not add any more manual locking.
[0116] After recalculating the risk contribution rate, the system confirmed that it had been consistently below the danger threshold. Therefore, it extracted the state trajectories of energy storage recovery, cooling station stabilization, and gradual release of auxiliary loads in the past few cycles as the initial observations of the reinforcement learning model, and then switched the output first resource allocation strategy. For example, the system first only restored the air conditioning optimization of part of the warehouse area, and then gradually restarted the energy consumption optimization action in subsequent cycles, rather than restoring all aggressive energy-saving operations at once.
[0117] The purpose of this step is to achieve an orderly recovery from the conservative power-saving mode to the energy consumption optimization mode, and to avoid new sudden jumps and misjudgments after the control path switch, so as to maintain the continuity of long-term system operation and learning ability.
[0118] Furthermore, the heterogeneous node status time-series data is the power load data reported by the IoT devices on the enterprise production line; the node control takeover event record is the instruction log of the operation and maintenance personnel manually cutting off the automatic control mode; the first resource allocation strategy is the equipment start-up and shutdown scheduling instruction with the lowest overall energy consumption index; the second resource allocation strategy is the redundant power supply scheduling instruction to maintain power stability.
[0119] This embodiment provides a specific implementation method for power consumption scenarios in production lines. Specifically, to facilitate engineering implementation, this embodiment further limits the aforementioned park to a wafer manufacturing production line with continuous production attributes, wherein the heterogeneous node status time series data mainly reflects the power load data reported by each production equipment and energy equipment. This limitation method is conducive to mapping the abstract data mining and strategy switching logic to actual collectable and executable industrial control objects.
[0120] The specific processing procedure is as follows: heterogeneous node status time-series data can be collected and reported by IoT devices on the production line, such as the power load data of lithography machines, etching machines, wet cleaning equipment, cleanroom air conditioners, circulating pumps, chillers, air compressors, vacuum pumps, exhaust gas treatment equipment, and energy storage converters. The heterogeneity here is mainly reflected in the different equipment types, rated power, sampling periods, and load change patterns. For example, the load of lithography machines is highly correlated with the wafer feeding cycle, the load of cleanroom air conditioners is related to changes in the thermal and humid environment of the workshop, chillers and circulating water systems have a clear coupling relationship, and energy storage devices have changes in charging and discharging directions. After these load data are uniformly incorporated into the time-series analysis, the system can accurately grasp the resource consumption status of the park from the power consumption side.
[0121] In this embodiment, the node control takeover event record is specifically manifested as the instruction log of the maintenance personnel manually cutting off the automatic control mode; the log may include the takeover time, the takeover object, the takeover reason remarks, and the manual setting value; its engineering significance is that when the maintenance personnel manually cut off the automatic mode, it indicates that the automatic control output has touched the boundary of maintenance experience; unlike general equipment fault logs, this type of log more directly reflects the change in control acceptance, and is therefore particularly suitable as a basis for training and correction of intervention probability indicators;
[0122] In this embodiment, the first resource allocation strategy is specifically manifested as the equipment start-up and shutdown scheduling command with the lowest overall energy consumption index. For example, when the safety margin allows, the system can make some high-efficiency chillers replace low-efficiency units, arrange energy storage to be released slowly during high load periods and recharged during low load periods, reduce non-critical loads in storage areas and auxiliary areas, and coordinate the start-up and shutdown of air compressors and vacuum pump groups in the optimal combination to reduce the overall power consumption corresponding to unit output.
[0123] The second resource allocation strategy is specifically manifested as redundant power supply scheduling instructions to maintain power stability; for example, the system can prohibit deep use of energy storage, keep standby units online and on standby, increase the power supply margin of the circuit where the critical process line is located, suspend high-fluctuation start-up and shutdown actions, and implement orderly load limiting for non-critical equipment; these two strategies have obvious differences in control style: the former focuses on efficiency and resource optimization, while the latter focuses on stability and redundancy.
[0124] As a further anomaly handling mechanism, if certain production equipment cannot be controlled for start-up or shutdown due to process interlocking restrictions, it will not be included in the adjustable objects of the first resource allocation strategy, but will only participate in the total power balance calculation as a fixed critical load; if the operation and maintenance instruction log does not clearly record the reason for takeover, the system can still infer whether it belongs to the power supply guarantee takeover, quality assurance takeover, or anomaly concern takeover based on the changes in equipment status before and after takeover, and assign different reference weights in subsequent model updates; if some power load data shows metering anomalies, the load curves of adjacent equipment in the same process section and production schedule can be used for reasonable correction, but the corrected data is only used as a temporary control aid and does not replace formal metering archives;
[0125] During a batch of continuous wafer feeding, the total power load of the production line fluctuates continuously with the process cycle. The system collects power load data from various production equipment and energy stations, identifying that the etching area and clean air conditioning area are in a high-load coupling state. At this time, if the risk contribution rate is low, the system can issue the first resource allocation strategy, such as arranging two high-efficiency chillers to run in parallel, delaying the start-up of some dehumidifiers in the storage area, and slowing down the release of energy storage during high-price periods to reduce the comprehensive power consumption per unit output. If the risk contribution rate increases due to external power rationing and the tendency for manual takeover, the system switches to the second resource allocation strategy, such as requiring energy storage to only undertake the function of basic voltage stabilization, forcibly retaining a backup chiller for hot standby, and locking key process loops from participating in any rotation control to ensure power stability.
[0126] The purpose of this step is to explicitly map the aforementioned methods to the enterprise's production line IoT devices and operation and maintenance log system, so that the strategy output directly corresponds to executable industrial scheduling instructions, thereby achieving engineering closure from the algorithm layer to the control layer.
[0127] The AI-based enterprise energy consumption optimization management system based on data mining includes: a data acquisition module, used to acquire heterogeneous node status time-series data and node control takeover event records;
[0128] The feature mining module is used to extract energy consumption statistics from heterogeneous node state time series data using a preset data mining model to generate physical resource consumption features, and combine them with node control takeover event records to generate intervention probability indicators.
[0129] The risk quantification module is used to input the intervention probability index and physical resource consumption characteristics into a resource depletion failure boundary model pre-trained based on historical failure samples, predict the resource depletion probability, and use the resource depletion probability as the risk contribution rate.
[0130] The adaptive decision-making module is used to calculate the first resource allocation strategy by using a multi-agent deep reinforcement learning model when the risk contribution rate is lower than the preset danger threshold, based on the physical resource consumption characteristics and with the optimization objective of minimizing the global comprehensive energy consumption index; and to calculate the second resource allocation strategy by using a nonlinear model prediction and control model when the risk contribution rate is higher than or equal to the preset danger threshold, based on the physical resource consumption characteristics.
[0131] The feedback execution module is used to output the first resource allocation strategy or the second resource allocation strategy as the target control command, obtain the node status data reported in real time by heterogeneous nodes after the target control command is issued as the node feedback data, calculate the reward value based on the node feedback data, and update the parameters of the multi-agent deep reinforcement learning model.
[0132] This embodiment provides a device-based implementation mechanism for an AI-driven enterprise energy consumption optimization management system based on data mining, such as... Figure 2 As shown; specifically, the system can be deployed on the energy dispatch center server of the aforementioned wafer manufacturing park, or deployed in a way that combines edge computing gateways with the central control platform; the system completes data acquisition, feature mining, risk quantification, adaptive decision-making and feedback execution through a modular structure, thereby solidifying the aforementioned methods and processes into an industrial software system that can run for a long time;
[0133] Furthermore, the key technical focus of the system device implementation in this embodiment is to establish a stable data closed loop and control closed loop, so that the energy scheduling behavior has a switchable, rollback and traceable execution mechanism in different risk ranges; accordingly, the descriptions of the minimum optimization and writeback of the global comprehensive energy consumption index mentioned in the text correspond to comprehensive resource consumption evaluation, control parameter solution and feedback sample update at the system implementation level, respectively, and do not introduce the meaning of business analysis.
[0134] The specific processing procedure is as follows: The data acquisition module is used to access various data sources from both the device side and the platform side. The device side includes smart meters, programmable logic controllers, distributed control systems, energy storage management systems, air compressor station controllers, chiller station group controllers, and process manufacturing execution system interfaces. The platform side includes a scheduling log server, an operation and maintenance work order system, and a manual takeover record terminal. In addition to collecting raw status time-series data, this module can also perform time alignment, device number mapping, outlier marking, and data collection status monitoring to ensure that subsequent processing is based on a unified data foundation.
[0135] The feature mining module is used to transform multi-source raw data into physical resource consumption features and intervention probability indicators with engineering significance. Internally, it can include a time-series window processing unit, an energy consumption statistics extraction unit, a deviation sample construction unit, and a takeover tendency estimation unit. The former is responsible for generating statistics such as load average, fluctuation amplitude, start-up and shutdown frequency, and energy storage charging and discharging switching times. The latter is responsible for extracting the strategy deviation degree by combining takeover event records and generating intervention probability indicators. In this way, the system can not only obtain the power consumption of the equipment, but also assess the probability of triggering manual intervention in the current scheduling.
[0136] The risk quantification module is used to call the resource depletion failure boundary model and output the risk contribution rate. This module can be run in conjunction with the historical failure case library, which stores past samples such as energy storage depletion, critical circuit voltage drop, and load rebound caused by manual switching to fully conservative mode. The risk contribution rate output by the module can be displayed synchronously on the scheduling interface as a risk reference for both control personnel and the system's automatic decision-making.
[0137] The adaptive decision-making module is used to select different control paths based on the risk contribution rate. When the risk contribution rate is low, the module calls the multi-agent deep reinforcement learning model and outputs the first resource allocation strategy that is biased towards resource optimization in the low-risk range. When the risk contribution rate is high, the module calls the nonlinear model predictive control model and outputs the second resource allocation strategy that is biased towards redundancy and stability in the high-risk range.
[0138] To avoid switching oscillations, the module can be configured with threshold buffering, continuous periodic confirmation, and manual lock priority rules; for example, if the device is in a manually forced power-saving state, the current conservative control should be maintained even if the model suggests recovery optimization.
[0139] The feedback execution module is responsible for control implementation and model update. On the one hand, it sends the target control command to the equipment control interface, and on the other hand, it continuously collects node feedback data after execution, such as total load changes, critical loop stability, energy storage recovery, clean area temperature, and whether manual connections have been added.
[0140] Based on this feedback, the module can generate reward values and write them back to the multi-agent deep reinforcement learning model to help the model gradually learn action combinations that reduce resource consumption while avoiding risks or manual disconnection. If during execution, it is detected that the control is ineffective, the device refuses to execute, communication fails, or the feedback is abnormal, the feedback execution module can trigger a rollback mechanism to maintain the previous stable control quantity and report the abnormal status.
[0141] A simplified example can be used to illustrate the collaborative relationship between modules. Suppose that four types of node data are collected in a certain period: cooling plant load, energy storage status, key process load, and operation and maintenance takeover logs. After the data acquisition module cleans the data, it sends it to the feature mining module to extract the corresponding resource features and intervention indicators. The risk quantification module outputs the risk contribution rate accordingly. If the risk contribution rate is lower than the danger threshold, the adaptive decision-making module calls the reinforcement learning path to output the first action.
[0142] If the risk contribution rate is higher than or equal to the danger threshold, the second action is output; after the feedback execution module executes the first or second action, it collects the corresponding feedback data and generates reward results to update the multi-agent deep reinforcement learning model; this data link forms a closed loop, and each module can independently record logs, which is convenient for auditing and tracing.
[0143] Furthermore, if the data acquisition module detects that a large area of key nodes are offline, the system can automatically enter the degradation mode, retaining only the power supply control of key loads and suspending complex optimizations; if the feature mining module finds that the deviation between the current data distribution and historical samples exceeds the preset distribution deviation threshold, it can be marked as a low-confidence period, and the risk quantification module will increase the conservative weight in this period.
[0144] If the adaptive decision-making module receives an external emergency command, such as an emergency load limit notification from the power grid, the external constraint takes precedence over the goal of minimizing overall energy consumption. If the feedback execution module confirms that the control command was successfully issued but the device does not respond, the system can transfer the device to the list of uncontrollable states and will no longer consider it as an adjustable resource in the strategy calculation of the next cycle.
[0145] During the entire day of operation of the aforementioned park, which experienced drastic fluctuations in electricity prices and staggered power rationing, the data acquisition module continuously received electricity consumption data from the cooling station, energy storage, air compressor station, clean air conditioning and process lines, as well as logs on the operation and maintenance console regarding manual disconnection of automatic mode; the feature mining module identified signs of high-frequency load rotation and increased tendency for manual intervention during certain periods, and the risk quantification module determined that the park was approaching the resource depletion boundary in the afternoon;
[0146] The adaptive decision-making module first calls predictive control to maintain redundancy and stability. After the risk subsides in the evening, it smoothly switches back to reinforcement learning optimization based on the historical state trajectory. The feedback execution module continuously writes back the equipment response, changes in manual takeover, and energy consumption results throughout the process for subsequent model updates and scheduling strategy optimization.
[0147] The purpose of this step is to materialize the methodology into a deployable, collaborative, and traceable industrial system architecture, thereby transforming enterprise energy consumption optimization management from single-process algorithm processing to a long-term online operation platform.
[0148] Here, we further unify the module naming relationships: In this embodiment, the module names disclosed to the outside of the system and corresponding one-to-one with the embodiments are only the data acquisition module, feature mining module, risk quantification module, adaptive decision-making module, and feedback execution module; the descriptions mentioned in the text, such as time-series window processing unit, energy consumption statistics extraction unit, deviation sample construction unit, and takeover tendency estimation unit, are all names of functional sub-units within the corresponding modules, used to describe the internal implementation of the modules, and do not replace or change the module names in the embodiments; accordingly, the reinforcement learning path and predictive control path mentioned in the text only represent the processing paths of calling different models within the adaptive decision-making module, rather than adding independent modules.
[0149] It should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and are not intended to limit it. Although the present invention has been described in detail with reference to preferred embodiments, those skilled in the art should understand that modifications or equivalent substitutions can be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.
Claims
1. An AI enterprise energy consumption optimization management method based on data mining, characterized in that, include: According to the preset collection cycle, acquire the status time sequence data of heterogeneous nodes and the node control takeover event records; Using a preset data mining model, energy consumption statistics are extracted from the time-series data of the heterogeneous node status to generate physical resource consumption characteristics, and intervention probability indicators are generated by combining the node control takeover event records. The intervention probability index and the physical resource consumption characteristics are input into a resource depletion failure boundary model pre-trained based on historical failure samples to predict the resource depletion probability, and the resource depletion probability is used as the risk contribution rate. In response to the risk contribution rate being lower than a preset danger threshold, based on the physical resource consumption characteristics and with the goal of minimizing the global comprehensive energy consumption index, a first resource allocation strategy is calculated using a multi-agent deep reinforcement learning model. In response to the risk contribution rate being higher than or equal to the preset danger threshold, a second resource allocation strategy is calculated by a nonlinear model prediction control model based on the physical resource consumption characteristics. The first resource allocation strategy or the second resource allocation strategy is output as the target control command. The target control command is sent to the corresponding heterogeneous nodes to perform start-up, shutdown and load scheduling. The node status data reported by the heterogeneous nodes in real time after the target control command is sent is obtained as node feedback data. The reward value is calculated based on the node feedback data, and the parameters of the multi-agent deep reinforcement learning model are updated. 2.The AI enterprise energy consumption optimization management method based on data mining of claim 1, wherein, Using a pre-defined data mining model, energy consumption statistics are extracted from the time-series data of the heterogeneous node states to generate physical resource consumption characteristics. These characteristics are then combined with the node control takeover event records to generate intervention probability indicators, including: Extract the heterogeneous node status time-series data within a preset time window before the node control takeover event occurs as a deviation sample; The anomaly score of the deviation samples is calculated using a preset isolated forest algorithm to generate a policy deviation score. The strategy deviation score is input into a logistic regression model that has been pre-trained using historical takeover event labeled data, and the output is the intervention probability index with a value range of 0 to 1. 3.The AI enterprise energy consumption optimization management method based on data mining of claim 1, wherein, Based on the physical resource consumption characteristics, and with the minimization of the global comprehensive energy consumption index as the optimization objective, a first resource allocation strategy is calculated using a multi-agent deep reinforcement learning model, including: The physical resource consumption characteristics are mapped to the environmental state space of the multi-agent deep reinforcement learning model. Obtain a pre-configured preset energy conversion coefficient representing the difference in external constraints of energy call at different time periods, calculate the global comprehensive energy consumption index based on the weighted summation relationship between the physical resource consumption characteristics and the preset energy conversion coefficient, construct a reward function with the minimization of the global comprehensive energy consumption index as the optimization objective, and use the reward function to perform action space search on the environmental state space to generate candidate action sequences; The action with the highest reward value in the candidate action sequence is output as the first resource allocation strategy. 4.The AI enterprise energy consumption optimization management method based on data mining of claim 1, wherein, Based on the physical resource consumption characteristics, a second resource allocation strategy is calculated using a nonlinear model prediction and control model, including: The physical resource consumption characteristics are used as the initial state input to the nonlinear model prediction and control model to construct a nonlinear state evolution equation with the resource consumption represented by the physical resource consumption characteristics as the state variable. Obtain the preset resource redundancy constraints; Under the resource redundancy constraint, the nonlinear state evolution equation is solved by rolling optimization to generate a control sequence; The current control quantity of the control sequence is output as the second resource allocation strategy.
5. The AI-based enterprise energy consumption optimization management method based on data mining according to claim 1, characterized in that, After outputting the second resource allocation strategy as the target control instruction, the following is also included: Obtain the time-series data of the heterogeneous node status for the next collection cycle, and recalculate the risk contribution rate; In response to the recalculated risk contribution rate falling below the preset danger threshold, historical state trajectory data of the nonlinear model predictive control model output is obtained; The historical state trajectory data is used as the initial state observation value of the multi-agent deep reinforcement learning model, and the first resource allocation strategy is switched as the target control command; in response to the recalculated risk contribution rate being higher than or equal to the preset danger threshold, the second resource allocation strategy is maintained as the target control command.
6. The AI-based enterprise energy consumption optimization management method based on data mining according to claim 1, characterized in that, The heterogeneous node status time-series data is the power load data reported by the IoT devices on the enterprise's production line; The node control takeover event record is the instruction log of the operation and maintenance personnel to manually disconnect the automatic control mode; The first resource allocation strategy is to use the device start / stop scheduling command with the lowest overall energy consumption index. The second resource allocation strategy is a redundant power supply scheduling command to maintain power consumption stability.
7. An AI-based enterprise energy consumption optimization management system based on data mining, characterized in that, include: The data acquisition module is used to acquire time-series data of heterogeneous node status and records of node control takeover events; The feature mining module is used to extract energy consumption statistics from the heterogeneous node state time series data using a preset data mining model to generate physical resource consumption features, and to generate intervention probability indicators by combining the node control takeover event records. The risk quantification module is used to input the intervention probability index and the physical resource consumption characteristics into a resource depletion failure boundary model pre-trained based on historical failure samples, predict the resource depletion probability, and use the resource depletion probability as the risk contribution rate. An adaptive decision-making module is used to respond to the risk contribution rate being lower than the preset danger threshold, and to calculate the first resource allocation strategy based on the physical resource consumption characteristics, with the optimization objective of minimizing the global comprehensive energy consumption index, through a multi-agent deep reinforcement learning model. When the risk contribution rate is higher than or equal to the preset danger threshold, a second resource allocation strategy is calculated by a nonlinear model prediction control model based on the physical resource consumption characteristics. The feedback execution module is used to output the first resource allocation strategy or the second resource allocation strategy as the target control instruction, obtain the node status data reported in real time by the heterogeneous nodes after the target control instruction is issued as the node feedback data, calculate the reward value based on the node feedback data, and update the parameters of the multi-agent deep reinforcement learning model.