A deep learning-based cogeneration collaborative optimization system and method
By using a deep learning-based cogeneration collaborative optimization system, which utilizes deep neural networks to process high-dimensional state information and autonomously learns the optimal control strategy, the system solves the problems of high model dependence, poor robustness, and fragmented optimization in existing technologies. This enables real-time adaptive optimization of the cogeneration system, improving its economy and flexibility.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- HUANENG SONGYUAN THERMAL POWER CO LTD
- Filing Date
- 2026-02-12
- Publication Date
- 2026-06-26
Smart Images

Figure CN122287984A_ABST
Abstract
Description
Technical Field
[0001] This disclosure pertains to the field of energy storage technology, and specifically relates to a deep learning-based cogeneration collaborative optimization system and method. Background Technology
[0002] Combined heat and power (CHP) is an important technology for realizing cascaded energy utilization and improving energy efficiency. In centralized heating systems, CHP units, thermal storage devices, heating networks, and end users together constitute a complex energy transfer system. Among these, user-side buildings possess a certain thermal storage capacity due to their thermal inertia and can be considered as natural "virtual energy storage" units. Currently, the optimized operation of this system faces the following challenges: First, the "heat-driven power generation" operation mode of CHP limits its flexibility in responding to grid peak shaving; second, the volatility of renewable energy sources (such as wind and solar power) places higher demands on the coordinated operation of heat and power; third, the coupling of heat network transmission delays, the dynamic characteristics of thermal storage devices, and user thermal inertia makes the system's dynamic response complex, making it difficult for traditional optimization methods based on static or deterministic models to achieve global optimum. Existing implementation schemes mostly employ model predictive control (MPC) or rule-based heuristic optimization. For example, in the paper "Research on Combined Heat and Power Optimization Scheduling Considering the Dynamic Characteristics of the Heat Network," the authors established a dynamic hydro-thermal model of the heat network and used MPC for rolling optimization. While this method considers dynamic characteristics, its optimization effect heavily depends on the accuracy of the model. However, uncertainties exist in actual systems, such as heating network parameters and user behavior, leading to model mismatch and reduced optimization effectiveness. Another common approach is to optimize each component separately: cogeneration units are scheduled based on electricity and heat load forecasts, and thermal storage devices perform simple peak shaving and valley filling, while the heating network and user side are treated as rigid loads, ignoring their enormous regulation potential and failing to achieve coordinated optimization across time and space scales.
[0003] In summary, the current technical solution has the following drawbacks: 1. High model dependence and poor robustness: Methods based on MPC rely heavily on accurate physical models and are poorly adaptable to changes in system parameters and uncertain disturbances (such as sudden weather changes and changes in user heating behavior), which can easily lead to optimization results deviating from the actual optimal. 2. Fragmented optimization and insufficient synergy: Existing solutions often treat CHP, thermal storage, pipeline network and users as independent or weakly related subsystems for optimization, failing to fully utilize the dynamic characteristics of each link (especially user thermal inertia) for deep synergy, which limits the overall economy and flexibility of the system. 3. Difficulty in handling high-dimensional continuous states and action spaces: The entire system has many state variables (such as temperature, pressure, state of thermal storage devices, indoor temperature of buildings, etc.) and complex control variables (such as CHP heat production power, water pump speed, valve opening, etc.). Traditional optimization methods face the "curse of dimensionality" when solving such high-dimensional, nonlinear, and strongly coupled problems, resulting in low computational efficiency. Summary of the Invention
[0004] To address the aforementioned issues, this disclosure provides a deep learning-based cogeneration collaborative optimization system, which maximizes the economic benefits and flexibility of system operation. The system includes: The system comprises a physical system layer, a data acquisition and perception layer, and a deep reinforcement learning agent; among these... The deep reinforcement learning agent is configured as follows: The data acquisition and perception layer collects operational data from the physical system layer. Calculate the action vector of the physical system layer based on the operational data; The motion vector is sent to the motion vector execution mechanism in the physical system layer, and the feedback data of the physical system layer is collected through the data acquisition and perception layer. The reward value corresponding to the action feedback data is calculated based on a preset reward function; Update its own parameters based on the reward value; Repeat the above method to obtain the target deep reinforcement learning agent; Cogeneration collaborative optimization is performed based on the target deep reinforcement learning agent.
[0005] Furthermore, the physical system layer includes: Combined heat and power units, thermal storage devices, centralized heating networks and user building complexes.
[0006] Furthermore, the data acquisition and sensing layer includes: Power grid information module, meteorological information module, heating network measuring point sensor and user indoor temperature and humidity sensor.
[0007] Furthermore, the deep reinforcement learning agent includes: State space construction module, policy network, and value network; The state space construction module is configured to: preprocess the running data and construct a state vector representing the current physical system layer state; The policy network is configured to: calculate the action vector based on the state vector; The value network is configured to: calculate the reward value corresponding to the action feedback data based on a preset reward function; wherein the reward function is constructed based on the economic quantification index, safety quantification index, and comfort quantification index of the physical system layer.
[0008] Furthermore, the deep reinforcement learning agent also includes: The experience replay buffer is configured as follows: Store experience samples; the experience samples consist of the state vector of the current physical system layer state, the action vector corresponding to the state vector of the current physical system layer state, the reward value corresponding to the state vector of the current physical system layer state, and the state vector of the next state of the current physical system layer. Periodically and randomly extract experience samples to update the parameters of the policy network and / or value network.
[0009] Furthermore, the deep reinforcement learning agent is configured as follows: Based on the Actor-Critic algorithm, the parameters are updated according to the reward value.
[0010] This disclosure also proposes a deep learning-based cogeneration ... Using the deep reinforcement learning agent, the following methods are performed: The data acquisition and perception layer collects operational data from the physical system layer. Calculate the action vector of the physical system layer based on the operational data; The motion vector is sent to the motion vector execution mechanism in the physical system layer, and the feedback data of the physical system layer is collected through the data acquisition and perception layer. The reward value corresponding to the action feedback data is calculated based on a preset reward function; Update its own parameters based on the reward value; Repeat the above method to obtain the target deep reinforcement learning agent; Cogeneration collaborative optimization is performed based on the target deep reinforcement learning agent.
[0011] This disclosure also proposes an electronic device, including a memory and a processor, wherein the memory stores a computer program or instructions, and when the computer program or instructions are executed by the processor, they are used to at least implement the above-described deep learning-based cogeneration synergistic optimization method.
[0012] This disclosure also proposes a computer-readable storage medium storing a computer program or instructions, which, when executed by a processor, are at least used to implement the above-described deep learning-based cogeneration synergistic optimization method.
[0013] This disclosure also proposes a computer program product stored in a computer-readable storage medium, which, when executed by a processor, is used to at least implement the above-described deep learning-based cogeneration synergistic optimization method.
[0014] Compared with the prior art, this disclosure has the following advantages: through the continuous interaction between the deep reinforcement learning agent and the environment, it autonomously learns the optimal control strategy, reduces the dependence on the precise physical model, and enhances the robustness of the system in uncertain environments; by using deep neural networks to process high-dimensional state information, it outputs continuous optimal control actions, realizes real-time and adaptive intelligent optimization control, and maximizes the economic benefits and flexibility of system operation while ensuring heating quality.
[0015] Other features and advantages of this disclosure will be set forth in the description which follows, and will be apparent in part from the description, or may be learned by practicing the disclosure. The objects and other advantages of this disclosure may be realized and obtained by means of the structures pointed out in the description, claims and drawings. Attached Figure Description
[0016] To more clearly illustrate the technical solutions in the embodiments of this disclosure or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of this disclosure. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0017] Figure 1 A deep learning-based cogeneration collaborative optimization system according to an embodiment of this disclosure is shown; Figure 2 An electronic device according to an embodiment of the present disclosure is shown; Figure 3 A computer-readable storage medium according to an embodiment of the present disclosure is shown. Detailed Implementation
[0018] To make the objectives, technical solutions, and advantages of the embodiments of this disclosure clearer, the technical solutions of the embodiments of this disclosure will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this disclosure, and not all embodiments. Based on the embodiments of this disclosure, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this disclosure.
[0019] This disclosure proposes a deep learning-based cogeneration collaborative optimization system, including: The system comprises a physical system layer, a data acquisition and perception layer, and a deep reinforcement learning agent; among these... The deep reinforcement learning agent is configured as follows: The data acquisition and perception layer collects operational data from the physical system layer. Calculate the action vector of the physical system layer based on the operational data; The motion vector is sent to the motion vector execution mechanism in the physical system layer, and the feedback data of the physical system layer is collected through the data acquisition and perception layer. The reward value corresponding to the action feedback data is calculated based on a preset reward function; Update its own parameters based on the reward value; Repeat the above method to obtain the target deep reinforcement learning agent; Cogeneration collaborative optimization is performed based on the target deep reinforcement learning agent.
[0020] According to some embodiments of this disclosure, the cogeneration collaborative optimization system based on deep learning proposed in this disclosure includes a physical system layer, a data acquisition and perception layer, and a deep reinforcement learning agent. The deep reinforcement learning agent is trained in a simulation environment composed of the operating data of the physical system layer of cogeneration. Based on the operating data of the physical system layer, it determines the action vector of the physical system layer, i.e. the optimization command, adjusts the operation of the physical system layer, and updates its own parameters according to the feedback data of the physical system layer. Finally, the agent learns to automatically generate collaborative optimization commands according to the real-time system status and directs the coordinated operation of each device.
[0021] Furthermore, the physical system layer includes: Combined heat and power units, thermal storage devices, centralized heating networks and user building complexes.
[0022] Furthermore, the data acquisition and sensing layer includes: Power grid information module, meteorological information module, heating network measuring point sensor and user indoor temperature and humidity sensor.
[0023] According to some embodiments of this disclosure, such as Figure 1 As shown, the cogeneration collaborative optimization system based on deep learning proposed in this disclosure mainly includes: a physical system layer (1), a data acquisition and perception layer (2), and a deep reinforcement learning agent (3). Among them, the physical system layer (1) consists of cogeneration units (4), thermal storage devices (5), centralized heating network (6), user building complex (7), and other devices; the data acquisition and perception layer (2) consists of a power grid information module (8), a meteorological information module (9), a heating network measuring point sensor (10), and a user indoor temperature and humidity sensor (11), and other devices; the deep reinforcement learning agent (3) consists of a state space construction module (12), a policy network (13), a value network (14), and an experience replay buffer (15), and other modules.
[0024] Furthermore, the deep reinforcement learning agent includes: State space construction module, policy network, and value network; The state space construction module is configured to: preprocess the running data and construct a state vector representing the current physical system layer state; The policy network is configured to: calculate the action vector based on the state vector; The value network is configured to: calculate the reward value corresponding to the action feedback data based on a preset reward function; wherein the reward function is constructed based on the economic quantification index, safety quantification index, and comfort quantification index of the physical system layer.
[0025] Furthermore, the deep reinforcement learning agent also includes: The experience replay buffer is configured as follows: Store experience samples; the experience samples consist of the state vector of the current physical system layer state, the action vector corresponding to the state vector of the current physical system layer state, the reward value corresponding to the state vector of the current physical system layer state, and the state vector of the next state of the current physical system layer. Periodically and randomly extract experience samples to update the parameters of the policy network and / or value network.
[0026] Furthermore, the deep reinforcement learning agent is configured as follows: Based on the Actor-Critic algorithm, the parameters are updated according to the reward value.
[0027] According to some embodiments of this disclosure, the deep learning-based cogeneration collaborative optimization system proposed in this disclosure ( Figure 1 The working principle of the system shown is as follows: The data acquisition and perception layer (2) collects various data from the physical system layer (1) in real time: the power grid information module (8) acquires time-of-use electricity price or real-time electricity price signals; the meteorological information module (9) acquires the outdoor temperature, wind speed, and solar radiation forecasts for the near future; the heating network measuring point sensor (10) collects the supply and return water temperature, pressure, and flow data; and the user indoor temperature and humidity sensor (11) collects the indoor temperature of representative users. All of this data is transmitted to the state space construction module (12) of the deep reinforcement learning agent (3).
[0028] The state space construction module (12) preprocesses and normalizes the input data to construct a state vector St representing the current system state. This state vector St includes, but is not limited to: the current time, future multi-period electricity price, future multi-period outdoor temperature prediction, CHP unit operating status, current heat storage capacity of the thermal storage device, temperature of key nodes in the pipeline network, average indoor temperature and distribution of users, etc. The strategy network (13) receives the state vector St as input, calculates it through its internal multi-layer neural network, and outputs an action vector At. This action vector At is a set of continuous control commands, such as: the thermal power setpoint of the CHP unit, the charging / discharging power of the thermal storage device, the speed or differential pressure setpoint of the pipeline network circulating water pump, etc. The action At is sent to the execution mechanism of the physical system layer (1), and the system enters the next state St+1. At the same time, the system will calculate the instant reward according to the preset reward function Rt. The design of the reward function Rt is crucial, as it comprehensively considers economy (such as energy consumption cost, electricity sales revenue), safety (such as equipment operation constraints, pipeline network hydraulic conditions) and comfort (the degree of deviation of the user's indoor temperature from the comfort range). For example, when users store heat during off-peak electricity prices and release heat during peak electricity prices to reduce CHP operation, and the user's room temperature always meets the standard, they will receive a high reward; conversely, if the user's room temperature exceeds the standard or the equipment operates beyond the limit, they will be penalized.
[0029] According to some embodiments of this disclosure, the above interaction process (St, At, Rt, St+1) is stored as an experience sample in an experience replay buffer (15). The agent periodically samples a batch of historical experiences from the buffer (15) to iteratively update the policy network (13) and the value network (14) (e.g., using an Actor-Critic algorithm, such as DDPG or TD3). The value network (14) is used to evaluate the value of the state and assist in the updating of the policy network (13). Through massive training iterations, the policy network (13) eventually learns the mapping policy π(At|St) from the system state to the optimal control action. This policy can maximize the long-term cumulative reward while satisfying all constraints, that is, achieve the collaborative optimization of the system throughout its entire life cycle.
[0030] Based on the same technical concept, this disclosure also proposes a deep learning-based cogeneration ... Using the deep reinforcement learning agent, the following methods are performed: The data acquisition and perception layer collects operational data from the physical system layer. Calculate the action vector of the physical system layer based on the operational data; The motion vector is sent to the motion vector execution mechanism in the physical system layer, and the feedback data of the physical system layer is collected through the data acquisition and perception layer. The reward value corresponding to the action feedback data is calculated based on a preset reward function; Update its own parameters based on the reward value; Repeat the above method to obtain the target deep reinforcement learning agent; Cogeneration collaborative optimization is performed based on the target deep reinforcement learning agent.
[0031] like Figure 2 As shown, this disclosure also proposes an electronic device, including a memory and a processor, wherein the memory stores a computer program or instructions, and when the computer program or instructions are executed by the processor, they are used to at least implement the above-described deep learning-based cogeneration synergistic optimization method.
[0032] like Figure 3 As shown, this disclosure also proposes a computer-readable storage medium storing a computer program or instructions, which, when executed by a processor, are at least used to implement the above-described deep learning-based cogeneration synergistic optimization method.
[0033] This disclosure also proposes a computer program product stored in a computer-readable storage medium, which, when executed by a processor, is used to at least implement the above-described deep learning-based cogeneration synergistic optimization method.
[0034] Although the present disclosure has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features; and such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present disclosure.
Claims
1. A deep learning-based cogeneration collaborative optimization system, characterized in that, include: The system comprises a physical system layer, a data acquisition and perception layer, and a deep reinforcement learning agent; among these... The deep reinforcement learning agent is configured as follows: The data acquisition and perception layer collects operational data from the physical system layer. Calculate the action vector of the physical system layer based on the operational data; The motion vector is sent to the motion vector execution mechanism in the physical system layer, and the feedback data of the physical system layer is collected through the data acquisition and perception layer. The reward value corresponding to the action feedback data is calculated based on a preset reward function; Update its own parameters based on the reward value; Repeat the above method to obtain the target deep reinforcement learning agent; Cogeneration collaborative optimization is performed based on the target deep reinforcement learning agent.
2. The deep learning-based cogeneration collaborative optimization system as described in claim 1, characterized in that, The physical system layer includes: Combined heat and power units, thermal storage devices, centralized heating networks and user building complexes.
3. The deep learning-based cogeneration collaborative optimization system as described in claim 2, characterized in that, The data acquisition and sensing layer includes: Power grid information module, meteorological information module, heating network measuring point sensor and user indoor temperature and humidity sensor.
4. The deep learning-based cogeneration collaborative optimization system as described in claim 3, characterized in that, The deep reinforcement learning agent includes: State space construction module, policy network, and value network; The state space construction module is configured to: preprocess the running data and construct a state vector representing the current physical system layer state; The policy network is configured to: calculate the action vector based on the state vector; The value network is configured to: calculate the reward value corresponding to the action feedback data based on a preset reward function; wherein the reward function is constructed based on the economic quantification index, safety quantification index, and comfort quantification index of the physical system layer.
5. The deep learning-based cogeneration collaborative optimization system as described in claim 4, characterized in that, The deep reinforcement learning agent also includes: The experience replay buffer is configured as follows: Store experience samples; the experience samples consist of the state vector of the current physical system layer state, the action vector corresponding to the state vector of the current physical system layer state, the reward value corresponding to the state vector of the current physical system layer state, and the state vector of the next state of the current physical system layer. Periodically and randomly extract experience samples to update the parameters of the policy network and / or value network.
6. The deep learning-based cogeneration collaborative optimization system as described in any one of claims 4-5, characterized in that, The deep reinforcement learning agent is configured as follows: Based on the Actor-Critic algorithm, the parameters are updated according to the reward value.
7. A deep learning-based cogeneration ... include: Using the deep reinforcement learning agent, the following methods are performed: The data acquisition and perception layer collects operational data from the physical system layer. Calculate the action vector of the physical system layer based on the operational data; The motion vector is sent to the motion vector execution mechanism in the physical system layer, and the feedback data of the physical system layer is collected through the data acquisition and perception layer. The reward value corresponding to the action feedback data is calculated based on a preset reward function; Update its own parameters based on the reward value; Repeat the above method to obtain the target deep reinforcement learning agent; Cogeneration collaborative optimization is performed based on the target deep reinforcement learning agent.
8. An electronic device comprising a memory and a processor, characterized in that, The memory stores a computer program or instructions, which, when executed by the processor, are used to implement at least the deep learning-based cogeneration collaborative optimization method as described in any one of claims 1-7.
9. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program or instructions, which, when executed by a processor, are used to implement at least the deep learning-based cogeneration collaborative optimization method as described in any one of claims 1-7.
10. A computer program product, said computer program product being stored in a computer-readable storage medium, characterized in that, When the computer program product is executed by a processor, it is used to implement at least the deep learning-based cogeneration collaborative optimization method as described in any one of claims 1-7.