A data center wind-liquid collaborative cooling terminal dynamic control method under load fluctuation

CN121920243BActive Publication Date: 2026-06-23INSPUR TIANYUAN COMM INFORMATION SYST CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
INSPUR TIANYUAN COMM INFORMATION SYST CO LTD
Filing Date
2026-03-25
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

Existing data center cooling control methods suffer from problems such as adjustment lag, low energy efficiency, and insufficient safety when facing dynamic loads. In particular, traditional PID control is difficult to handle multivariable problems, MPC is inaccurate in load prediction in real-world environments, deep reinforcement learning lacks training data, and air-liquid co-cooling terminals lack integrated control strategies.

Method used

We construct computational fluid dynamics simulation models at the data center and server levels, generate time-series datasets, train time-series agent models, design a deep reinforcement learning framework, combine energy consumption and thermal safety reward functions, train a dynamic control agent for air-liquid coordinated cooling terminals, and achieve real-time regulation through online iterative optimization.

Benefits of technology

It enables real-time intelligent control of the air-liquid cooling ratio, significantly improving data center energy efficiency, ensuring the thermal safety of servers under dynamic loads, reducing the overall energy consumption of air conditioning fans and pumping systems, and providing thermal safety assurance.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN121920243B_ABST
    Figure CN121920243B_ABST
Patent Text Reader

Abstract

The application discloses a data center air-liquid collaborative cooling terminal dynamic control method under fluctuating load, and relates to the technical field of data center cooling control. In view of the problem that the current data center cooling terminal is slow to adjust when facing dynamic load, the scheme comprises the following steps: constructing a computer room level and a server level computational fluid dynamics simulation model, and generating a time series data set for training a proxy model through dynamic simulation; training the computer room level and the server level time series proxy model based on the time series data set; coupling the computer room level and the server level time series proxy model and a data center energy consumption model to construct an integrated simulation environment; designing a deep reinforcement learning framework; training an air-liquid collaborative cooling terminal dynamic control intelligent agent based on the integrated simulation environment and the deep reinforcement learning framework; performing security deployment on the trained intelligent agent to realize cooling terminal dynamic control, and performing online iterative optimization on the time series proxy model and the intelligent agent according to actual collected data.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of data center cooling control technology, specifically a dynamic control method for the air-liquid coordinated cooling terminal of a data center under load fluctuations. Background Technology

[0002] With the continuous development of technologies such as artificial intelligence and big data, the demand for data processing capabilities is constantly increasing, driving a significant enhancement in server computing performance. However, this also brings increasingly severe challenges in energy consumption and thermal management. In data center cooling systems, cooling energy consumption accounts for approximately 40% of total energy consumption. Therefore, developing efficient cooling solutions is crucial for improving energy efficiency and ensuring system thermal safety. With the large-scale application of high-density servers, data center cooling models are gradually shifting from primarily air cooling to primarily liquid cooling. Combined air-liquid cooling technology has become key to achieving high-quality development of data centers.

[0003] However, existing cooling control methods still have certain limitations. Traditional PID (Proportional-Integral-Derivative) control is difficult to effectively handle multivariable control problems and cannot directly integrate the optimization objective. While MPC (Model Predictive Control) can solve optimal control problems under multivariable constraints, its application is usually based on ideal assumptions of known workload curves. In real data center environments, workloads exhibit significant fluctuations and uncertainties, making accurate prediction difficult and thus limiting the practical effectiveness of MPC. Although deep reinforcement learning methods can optimize cooling control, trial-and-error learning in real-world environments may lead to uncontrollable thermal risks. Furthermore, newly built or renovated data centers often lack sufficient historical data to support training. In addition, existing control methods are mostly designed for single cooling terminals (such as air-cooled or liquid-cooled terminals), lacking integrated control strategies suitable for combined air-liquid cooling terminals. Summary of the Invention

[0004] This invention addresses the problems of lag in adjustment, low energy efficiency, and insufficient security of current data center cooling terminals under dynamic loads. It provides a dynamic control method for data center air-liquid co-cooling terminals under load fluctuations, which enables real-time intelligent adjustment of the air-liquid cooling ratio, significantly improving data center energy efficiency and ensuring the thermal safety of servers under dynamic loads.

[0005] The present invention provides a dynamic control method for the air-liquid co-cooling terminal of a data center under load fluctuations, and the technical solution adopted to solve the above-mentioned technical problems is as follows:

[0006] A dynamic control method for air-liquid co-cooling terminals in a data center under load fluctuations includes the following steps:

[0007] S1. Construct computer room-level and server-level computational fluid dynamics simulation models, and generate time-series datasets for training agent models through dynamic simulation;

[0008] S2. Train a data center-level time-series proxy model and a server-level time-series proxy model based on the time-series dataset, respectively;

[0009] S3, coupled data center-level and server-level time-series agent models and data center energy consumption models, construct an integrated simulation environment; design a deep reinforcement learning framework, which includes a state space, an action space and a reward function with energy consumption and thermal safety as optimization objectives;

[0010] S4. Based on the integrated simulation environment and deep reinforcement learning framework, train the dynamic control agent of the air-liquid co-cooling terminal.

[0011] S5. Securely deploy the trained agent to achieve dynamic control of the cooling end, and perform online iterative optimization of the time-series agent model and agent based on actual collected data.

[0012] Optionally, step S1 specifically includes the following operations:

[0013] 1A. Construct a computer room-level computational fluid dynamics simulation model and generate a time-series dataset through dynamic simulation:

[0014] Based on the actual geometric layout and equipment configuration of the target data center, a data center-level computational fluid dynamics simulation model is constructed. In this model, the server is equivalent to a porous medium model, and the air-cooled terminals are modeled according to their actual types, covering three forms: data center level, row level, and backplane air conditioning.

[0015] Design dynamic simulation conditions, including: dynamic scenarios of frequency changes of air-cooled terminal air conditioner fans and power fluctuations of different servers;

[0016] The computer room-level computational fluid dynamics simulation model was used to perform dynamic simulations under different dynamic simulation conditions. Time series data of specified parameters were collected: air conditioning fan frequency, power of each server, inlet air temperature and velocity of each server from time tn to time t, and inlet air temperature and velocity of each server at time t+1.

[0017] 1B. Construct a server-level computational fluid dynamics simulation model and generate a time-series dataset through dynamic simulation:

[0018] Construct a server-level computational fluid dynamics simulation model for specified heat-generating and heat-dissipating components;

[0019] The design incorporates dynamic simulation conditions, including various dynamic scenarios such as different server power fluctuations, server refrigerant inlet temperature and flow rate changes, and changes in server inlet air temperature and velocity.

[0020] The server-level computational fluid dynamics simulation model is used to perform dynamic simulations under different dynamic simulation conditions. The time series data of the specified parameters are collected as follows: power of the specified component, refrigerant temperature and flow rate at the server inlet, air temperature and velocity at the server inlet from time tn to time t, and CPU / GPU temperature and air temperature at time t+1.

[0021] Alternatively, step S2 may specifically include the following operations:

[0022] 2A. Training the data center-level time-series proxy model: Based on the time-series data generated by the data center-level computational fluid dynamics simulation model, train the data center-level time-series proxy model; the input of the data center-level time-series proxy model is the time-series data of air conditioning fan frequency, power of each server, and air temperature and velocity at the inlet of each server from time tn to time t, and the output is the predicted value of air temperature and velocity at the inlet of each server at time t+1.

[0023] 2B. Training the server-level time-series proxy model: Based on the time-series data generated by the server-level computational fluid dynamics simulation model, train the server-level time-series proxy model; the input of the server-level time-series neural network model is the time-series data of the specified component power, server inlet refrigerant temperature and flow rate, and server inlet air temperature and velocity from time tn to time t, and the output is the server CPU / GPU temperature and server outlet air temperature at time t+1.

[0024] Preferably, the data center-level time-series proxy model and the server-level time-series proxy model involved adopt temporal convolutional networks respectively.

[0025] Further optionally, step S3 is performed to couple the data center-level and server-level time-series proxy models, and combine them with the system energy consumption calculation module based on physical formulas to jointly build an integrated simulation environment;

[0026] In this environment, the dynamic control agent issues control actions, including adjusting the frequency of the air conditioner fan and the valve opening.

[0027] The data center-level time-series agent model predicts the air temperature and speed at the entrance of each server at the next moment based on the current control actions and historical state information.

[0028] The server-level time-series agent model predicts the server CPU / GPU temperature and the server outlet air temperature at the next moment based on the current control action and historical state information.

[0029] Optionally, the system energy consumption calculation module includes refrigerant pump power calculation and fan power calculation, with the following calculation formulas:

[0030] ,

[0031] In the formula, This refers to the fan power. is the rated power of the fan; f is the actual operating frequency of the fan; The rated frequency of the fan;

[0032] ,

[0033] In the formula, Power of the refrigerant pump; Rated power; This refers to the actual operating speed of the pump; This is the pump's rated speed.

[0034] Further, optionally, step S3 is performed to design a deep reinforcement learning framework, the framework including a state space, an action space, and a reward function optimized for energy consumption and thermal safety, wherein:

[0035] The state space contains dynamic and static variables in the form of historical time series, specifically composed of the following parameters: temperature of each CPU / GPU, air inlet temperature of each server, refrigerant supply temperature, CDU valve opening, air conditioner fan frequency, and power of each component of CPU / GPU, motherboard, and memory.

[0036] The action space is defined as the continuous control space, including the air conditioner fan frequency and valve opening degree;

[0037] The expression for the reward function is as follows:

[0038] ,,

[0039] In the formula, To control the time step; The maximum temperature of the i-th server's CPU / GPU chip; Let be the exhaust temperature of the i-th server; Number of servers; and These are the threshold values ​​for the server chip temperature and the outlet temperature, respectively. The function max(0,value) ensures that a penalty term is generated only when the server chip temperature or the outlet temperature exceeds its safe threshold, thereby guiding the agent to control the device temperature below the safe threshold. , and These are the weighting coefficients.

[0040] Optionally, step S4 is executed, which uses a deep reinforcement learning algorithm to train the dynamic control agent of the air-liquid co-cooling terminal based on the integrated simulation environment and deep reinforcement learning framework. During the training process, the training convergence is determined by observing the change trend of the average round reward curve over time: when the curve remains stable, the fluctuation amplitude within a continuous preset time does not exceed the preset threshold, and there is no continuous upward trend, the agent is determined to have reached the convergence state.

[0041] Optionally, step S5 is executed to safely deploy the trained agent. During the deployment process, a rule-based security layer is pre-added between the agent and the physical actuator. This security layer contains absolute security rules: if the temperature of any CPU / GPU or the server outlet temperature exceeds the preset security threshold temperature, the agent's instructions are overridden and adjusted to the preset air supply temperature and air supply flow to ensure the server's thermal safety.

[0042] Optionally, step S5 is executed to perform online iterative optimization of the time-series agent model and the intelligent agent based on the actual collected data. During this process, real-time operating data output by sensors and control systems is continuously collected to build a historical time-series database.

[0043] The data in the historical time series database is used for iterative optimization in the following two aspects: when the changes in the physical layout of servers and rack utilization rate in the data center exceed the preset threshold, iterative optimization is performed on the data center-level and server-level time series agent models; when the operating parameters of the data center remain within the preset threshold range for a continuous preset duration, the control agent is controlled to conduct direct online learning.

[0044] The present invention provides a dynamic control method for air-liquid coordinated cooling terminals in data centers under load fluctuations, which has the following advantages compared with the prior art:

[0045] 1. This invention solves the problems of lag in adjustment, low energy efficiency and insufficient safety of current data center cooling terminals when facing dynamic loads. It realizes real-time intelligent control of the air-liquid cooling ratio, significantly improves the energy efficiency of data centers, and ensures the thermal safety of servers under dynamic loads.

[0046] 2. This invention trains a dynamically controlled agent that can dynamically and collaboratively regulate air-cooled and liquid-cooled terminals based on real-time load fluctuations. While strictly ensuring that server chip and outlet air temperatures do not exceed safety thresholds, it significantly reduces the overall energy consumption of air conditioning fans and pumping systems. By constructing data center-level and server-level computational fluid dynamics simulation models and generating time-series data, efficient time-series proxy models are trained, replacing computationally expensive real-time CFD simulations. This multi-layered proxy model coupling method ensures prediction accuracy and provides an efficient and reliable training environment for the reinforcement learning agent. Furthermore, by continuously collecting actual operational data, the time-series proxy can be further refined. The model and agent strategy undergo dual iterative optimization. For major adjustments such as changes in server layout or rack utilization, the time-series agent model can be updated and retrained. During stable operation, the agent can learn directly online, enabling the control strategy to continuously adapt to the dynamic changes in the data center and maintain optimal long-term performance. In actual deployment, a rule-based security layer is designed. This security layer can monitor key temperature parameters in real time. Once an over-temperature risk is detected, it immediately overrides the agent's instructions and executes preset safety operations, providing absolute thermal safety assurance for core equipment and greatly reducing the potential risks of applying agents to critical infrastructure. Attached Figure Description

[0047] Appendix Figure 1 This is a flowchart of the method of the present invention. Detailed Implementation

[0048] To make the technical solution, the technical problem solved, and the technical effect of the present invention clearer, the technical solution of the present invention will be clearly and completely described below in conjunction with specific embodiments.

[0049] Example 1: Refer to Appendix Figure 1 This embodiment proposes a dynamic control method for the air-liquid coordinated cooling terminal of a data center under load fluctuations, which includes the following steps:

[0050] S1. Construct data center-level and server-level computational fluid dynamics (CFD) simulation models, and generate time-series datasets for training agent models through dynamic simulation.

[0051] In this embodiment, the construction of a data center-level computational fluid dynamics (CFD) simulation model and the generation of a time-series dataset through dynamic simulation specifically includes the following operations:

[0052] Based on the actual geometric layout and equipment configuration of the target data center, a data center-level computational fluid dynamics simulation model is constructed. In this model, the server is equivalent to a porous medium model, and the air-cooled terminals are modeled according to their actual types, covering three forms: data center level, row level, and backplane air conditioning.

[0053] The design incorporates dynamic simulation conditions, including various dynamic scenarios such as frequency changes of air-cooled terminal air conditioner fans and power fluctuations of different servers.

[0054] The computer room-level computational fluid dynamics simulation model was used to perform dynamic simulations under different dynamic simulation conditions. The time series data of key parameters were collected as follows: air conditioning fan frequency, power of each server, air temperature and velocity at the inlet of each server from time tn to time t, and air temperature and velocity at the inlet of each server at time t+1.

[0055] In this embodiment, constructing a server-level computational fluid dynamics simulation model and generating a time-series dataset through dynamic simulation specifically includes the following operations:

[0056] Construct a server-level computational fluid dynamics simulation model that includes key heat-generating and heat-dissipating components such as CPU / GPU, two-phase cold plate, motherboard, and memory;

[0057] The design incorporates dynamic simulation conditions, including various dynamic scenarios such as power fluctuations of different servers, changes in server refrigerant inlet temperature and flow rate, and changes in server inlet air temperature and speed.

[0058] The server-level computational fluid dynamics simulation model was used to perform dynamic simulations under different dynamic simulation conditions. The time series data of key parameters were collected as follows: power of key components such as CPU / GPU, motherboard, and memory from time tn to time t, server inlet refrigerant temperature and flow rate, server inlet air temperature and velocity, server CPU / GPU temperature and server outlet air temperature at time t+1.

[0059] S2. Train a data center-level time-series proxy model and a server-level time-series proxy model based on the time-series dataset.

[0060] In this embodiment, training the data center-level time-series proxy model specifically includes the following operations: training the data center-level time-series proxy model based on the time-series data generated by the data center-level computational fluid dynamics simulation model; the input of the data center-level time-series proxy model is the time-series data of the air conditioning fan frequency, the power of each server, and the inlet air temperature and velocity of each server from time tn to time t, and the output is the predicted value of the inlet air temperature and velocity of each server at time t+1.

[0061] In this embodiment, training the server-level time-series proxy model specifically includes the following operations: training the server-level time-series proxy model based on the time-series data generated by the server-level computational fluid dynamics simulation model; the input of the server-level time-series neural network model is the time-series data from time tn to time t, including the power of key components such as CPU / GPU, motherboard, and memory, the temperature and flow rate of the server inlet refrigerant, and the temperature and velocity of the server inlet air; the output is the server CPU / GPU temperature and the server outlet air temperature at time t+1.

[0062] Specifically, the data center-level temporal proxy model and the server-level temporal proxy model both employ Temporal Convolutional Networks (TCNs). TCNs capture long-term dependencies in sequences through dilated causal convolutions and mitigate the vanishing gradient problem during deep network training by leveraging residual connections. Their receptive field grows exponentially with network depth, making them suitable for modeling dependencies in long-term sequences.

[0063] The core operation of TCN is dilated causal convolution:

[0064] ,

[0065] In the formula, Let be the output vector at time step t. In time step The input vector, is the weight of the i-th position in the convolution kernel, k is the kernel size, and d is the inflation factor.

[0066] S3, coupled with data center-level and server-level time-series agent models and data center energy consumption models, constructs an integrated simulation environment; designs a deep reinforcement learning framework, which includes a state space, an action space, and a reward function with energy consumption and thermal safety as optimization objectives.

[0067] To perform this step, first couple the data center-level and server-level time-series proxy models, and combine them with the system energy consumption calculation module based on physical formulas to jointly build an integrated simulation environment;

[0068] Subsequently, in this environment, the dynamic control agent issues control actions, including adjusting the frequency of the air conditioner fan and the valve opening.

[0069] Finally, the data center-level time-series agent model predicts the air temperature and speed of each server at the next moment (time t+1) based on the current control actions and historical state information (specific input parameters include the air conditioning fan frequency from time tn to time t, the power of each server (excluding CPU / GPU power, the power of components such as motherboards and memory that require air cooling) and the time-series data of the air temperature at the inlet of each server).

[0070] The server-level timing agent model predicts the server CPU / GPU temperature and server outlet air temperature at the next moment (t+1 moment) based on the current control action and historical state information (specific input parameters include timing data of key components such as CPU / GPU, motherboard, and memory from time tn to time t, server inlet refrigerant temperature and flow rate, and server inlet air temperature and speed).

[0071] The system energy consumption calculation module includes refrigerant pump power calculation and fan power calculation, and the calculation formulas are as follows:

[0072] ,

[0073] In the formula, The power of the wind turbine is measured in kW. is the rated power of the fan, measured in kW; f is the actual operating frequency of the fan, measured in Hz; This is the rated frequency of the fan, measured in Hz.

[0074] ,

[0075] In the formula, This refers to the power of the refrigerant pump, measured in kW. Rated power, measured in kW; The actual operating speed of the pump, measured in RPM; This is the pump's rated speed, measured in RPM.

[0076] When performing this step, the deep reinforcement learning framework is designed, and the framework includes a state space, an action space, and a reward function optimized for energy consumption and thermal safety, wherein:

[0077] The state space contains dynamic and static variables in the form of historical time series, specifically composed of the following parameters: temperature of each CPU / GPU, air inlet temperature of each server, refrigerant supply temperature, CDU valve opening, air conditioner fan frequency, and power of key components such as CPU / GPU, motherboard, and memory.

[0078] The action space is defined as the continuous control space, including the air conditioner fan frequency and valve opening degree;

[0079] The expression for the reward function is as follows:

[0080] ,,

[0081] In the formula, The time step for control is measured in seconds (s). The maximum temperature of the i-th server CPU / GPU chip is measured in °C. Let be the exhaust temperature of the i-th server, measured in °C. Number of servers; and These are the threshold values ​​for the server chip temperature and the outlet temperature, respectively, measured in °C. The function max(0, value) ensures that a penalty term is generated only when the server chip temperature or the outlet temperature exceeds its safe threshold, thereby guiding the agent to control the device temperature below the safe threshold. , and These are the weighting coefficients.

[0082] S4. Based on the integrated simulation environment and deep reinforcement learning framework, the dynamic control agent of the air-liquid co-cooling terminal is trained.

[0083] This step involves training the dynamic control agent for the air-liquid co-cooling terminal using a deep reinforcement learning algorithm based on an integrated simulation environment and a deep reinforcement learning framework. During the training process, the convergence of the training is determined by observing the trend of the average round reward curve over time: when the curve remains stable, the fluctuation amplitude within a preset time period does not exceed a preset threshold, and there is no continuous upward trend, the agent is determined to have reached the convergence state.

[0084] It should be added that the deep reinforcement learning algorithm uses the Soft Actor-Critic (SAC) algorithm. The SAC algorithm is an offline policy reinforcement learning algorithm based on the maximum entropy principle. Its ability to balance exploration and exploitation helps the agent to find the optimal energy-saving strategy more efficiently while ensuring thermal safety.

[0085] The objective function of the SAC algorithm is shown in the equation:

[0086] ,

[0087] In this formula, The expected cumulative return for strategy π; In the state Next action The instant reward obtained afterward; It is a temperature coefficient used to control the importance of the entropy regularization term relative to the reward; For strategy In state The policy entropy measures the randomness of a policy in a given state.

[0088] The SAC algorithm architecture mainly consists of a Q-network, a policy network, and an experience replay buffer. The experience replay buffer stores historical experience, enabling the algorithm to learn repeatedly from past experiences and thus improve sample efficiency. The Q-network is mainly used to estimate the expected soft Q-value achievable by following the current policy under a given state-action pair. The soft Q-value function and the soft state value function are defined as follows:

[0089] ,

[0090] ,

[0091] In this formula, The soft Q value represents the state. Next action Then, the expected cumulative return that can be obtained by following the current strategy π; This represents the environmental state at time t; This represents the action decision at time t; This represents the discount factor, used to measure the current value of future rewards; The soft state value represents the value in the state. Below, the expected cumulative return that can be obtained by following strategy π; This represents the temperature coefficient, used to balance reward optimization and exploration capabilities in a strategy.

[0092] To mitigate the overestimation problem commonly found in value function estimation, the SAC algorithm constructs two independent soft Q networks (with parameters respectively). , When calculating the target value, the smaller of the two values ​​is selected. Simultaneously, a target network (with parameters respectively) is maintained for each Q-network. , This reduces bias in time-series difference calculations and improves the stability of algorithm training. The parameters of the soft Q-function are trained by minimizing the soft Bellman residual, and its loss function is:

[0093] ,

[0094] ,

[0095] In this formula, Let θ be the loss function for the Q-network with parameters θ. The target Q value; For the target Q network; By policy network (parameters are) The generated state in the next state Select the next action The probability of.

[0096] The parameters of the target Q-network are updated slowly using an exponential moving average strategy:

[0097] ,

[0098] In this formula, The parameters of the target Q-network; These are the parameters for the online Q network.

[0099] The parameters ϕ of the policy network can be learned by minimizing the following objective function:

[0100] ,

[0101] In the formula, The policy network parameters are The loss function at that time.

[0102] Furthermore, SAC dynamically controls the degree of strategy exploration by automatically adjusting the temperature coefficient α by minimizing a loss function.

[0103] ,

[0104] In this formula, It is the preset target entropy, which is usually taken as the negative value of the action space dimension (e.g., −dim(A)).

[0105] S5. Securely deploy the trained agent to achieve dynamic control of the cooling end, and perform online iterative optimization of the time-series agent model and agent based on actual collected data.

[0106] When deploying the agent, a rule-based security layer is pre-added between the agent and the physical actuator. This security layer contains absolute security rules: if the temperature of any CPU / GPU or the server outlet temperature exceeds the preset security threshold temperature, the agent's instructions are overridden and adjusted to the preset air supply temperature and air flow rate to ensure the server's thermal safety.

[0107] The time-series agent model and intelligent agent are iteratively optimized online based on actual collected data. This process requires continuous collection of real-time operational data output from sensors and the control system to construct a historical time-series database. The data in the historical time-series database is used for iterative optimization in the following two aspects:

[0108] When changes in the physical layout of servers and rack utilization rate within the data center exceed preset thresholds, iterative optimization is performed on the data center-level and server-level time-series proxy models.

[0109] When the data center's operating parameters remain within a preset threshold range for a continuous preset duration, the control agent can conduct direct online learning.

[0110] Specifically, iterative optimization is performed on the data center-level and server-level time-series agent models: the models are periodically fine-tuned and validated using newly collected operational data to reduce prediction errors. Subsequently, the agent can be retrained or fine-tuned in this updated high-precision simulation environment to obtain optimized control strategies that can cope with more complex operating conditions.

[0111] The agent conducts online learning: newly collected operational data (including state, action, reward, and next state) is stored in the experience replay pool in real time, which can be directly used to fine-tune and optimize the strategy of the deployed DRL agent online.

[0112] In summary, the dynamic control method for air-liquid co-cooling terminals in data centers under load fluctuations, as proposed in this invention, solves the problems of lag in adjustment, low energy efficiency, and insufficient safety of current data center cooling terminals when facing dynamic loads. It achieves real-time intelligent control of the air-liquid cooling ratio, significantly improves the energy efficiency of data centers, and ensures the thermal safety of servers under dynamic loads.

[0113] The above specific examples illustrate the principles and implementation methods of the present invention in detail. These embodiments are merely for the purpose of helping to understand the core technical content of the present invention. Based on the above specific embodiments of the present invention, any improvements and modifications made to the present invention by those skilled in the art without departing from the principles of the present invention should fall within the patent protection scope of the present invention.

Claims

1. A dynamic control method for air-liquid co-cooling terminals in a data center under load fluctuations, characterized in that, Includes the following steps: S1. Construct computer room-level and server-level computational fluid dynamics simulation models, and generate time-series datasets for training agent models through dynamic simulation; S2. Train a data center-level time-series proxy model and a server-level time-series proxy model based on the time-series dataset, respectively; S3, coupled with data center-level and server-level time-series proxy models and data center energy consumption models, constructs an integrated simulation environment; Design a deep reinforcement learning framework, which includes a state space, an action space, and a reward function optimized for energy consumption and thermal safety; specifically including: A combined simulation environment is constructed by coupling data center-level and server-level time-series agent models and integrating them with a system energy consumption calculation module based on physical formulas. In this environment, a dynamic control agent issues control actions, including the adjustment of air conditioner fan frequency and valve opening. The data center-level time-series agent model predicts the air temperature and velocity at the inlet of each server at the next moment based on the current control actions and historical state information. The server-level time-series agent model then predicts the server CPU / GPU temperature and server outlet air temperature at the next moment based on the current control actions and historical state information. The system energy consumption calculation module includes refrigerant pump power calculation and fan power calculation. The calculation formulas are as follows: , In the formula, This refers to the fan power. is the rated power of the fan; f is the actual operating frequency of the fan; The rated frequency of the fan; , In the formula, Power of the refrigerant pump; Rated power; This refers to the actual operating speed of the pump; This is the pump's rated speed; Design a deep reinforcement learning framework, which includes a state space, an action space, and a reward function with energy consumption and thermal safety as optimization objectives. The state space contains dynamic and static variables in the form of historical time series, specifically composed of the following parameters: temperature of each CPU / GPU, inlet air temperature of each server, refrigerant supply temperature, CDU valve opening, air conditioning fan frequency, and power consumption of each component (CPU / GPU, motherboard, memory). The action space is defined as a continuous control space, including air conditioning fan frequency and valve opening. The reward function is expressed as follows: ,, In the formula, To control the time step; The maximum temperature of the i-th server's CPU / GPU chip; Let be the exhaust temperature of the i-th server; Number of servers; and These are the threshold values ​​for the server chip temperature and the outlet temperature, respectively. The function max(0,value) ensures that a penalty term is generated only when the server chip temperature or the outlet temperature exceeds its safe threshold, thereby guiding the agent to control the device temperature below the safe threshold. , and These are the weighting coefficients; S4. Based on the integrated simulation environment and deep reinforcement learning framework, train the dynamic control agent of the air-liquid co-cooling terminal. S5. Deploy the trained agent securely to achieve dynamic control of the cooling terminal, and perform online iterative optimization of the time-series agent model and agent based on actual collected data. During the agent deployment process, a rule-based security layer is added in advance between the agent and the physical actuator. This security layer contains absolute security rules: if the temperature of any CPU / GPU or the server outlet temperature exceeds the preset security threshold temperature, the agent instruction is overridden and adjusted to the preset air supply temperature and air supply flow to ensure the thermal safety of the server.

2. The method for dynamic control of data center air-liquid co-cooling terminals under load fluctuations according to claim 1, characterized in that, Step S1 specifically includes the following operations: 1A. Construct a computer room-level computational fluid dynamics simulation model and generate a time-series dataset through dynamic simulation: Based on the actual geometric layout and equipment configuration of the target data center, a data center-level computational fluid dynamics simulation model is constructed. In this model, the server is equivalent to a porous medium model, and the air-cooled terminals are modeled according to their actual types, covering three forms: data center level, row level, and backplane air conditioning. Design dynamic simulation conditions, including: dynamic scenarios of frequency changes of air-cooled terminal air conditioner fans and power fluctuations of different servers; The computer room-level computational fluid dynamics simulation model was used to perform dynamic simulations under different dynamic simulation conditions. Time series data of specified parameters were collected: air conditioning fan frequency, power of each server, inlet air temperature and velocity of each server from time tn to time t, and inlet air temperature and velocity of each server at time t+1. 1B. Construct a server-level computational fluid dynamics simulation model and generate a time-series dataset through dynamic simulation: Construct a server-level computational fluid dynamics simulation model for specified heat-generating and heat-dissipating components; The design incorporates dynamic simulation conditions, including various dynamic scenarios such as different server power fluctuations, server refrigerant inlet temperature and flow rate changes, and changes in server inlet air temperature and velocity. The server-level computational fluid dynamics simulation model is used to perform dynamic simulations under different dynamic simulation conditions. The time series data of the specified parameters are collected as follows: power of the specified component, refrigerant temperature and flow rate at the server inlet, air temperature and velocity at the server inlet from time tn to time t, and CPU / GPU temperature and air temperature at time t+1.

3. The dynamic control method for data center air-liquid coordinated cooling terminal under load fluctuations according to claim 2, characterized in that, Step S2 specifically includes the following operations: 2A. Training the data center-level time-series proxy model: Based on the time-series data generated by the data center-level computational fluid dynamics simulation model, train the data center-level time-series proxy model; the input of the data center-level time-series proxy model is the time-series data of air conditioning fan frequency, power of each server, and air temperature and velocity at the inlet of each server from time tn to time t, and the output is the predicted value of air temperature and velocity at the inlet of each server at time t+1. 2B. Training the server-level time-series proxy model: Based on the time-series data generated by the server-level computational fluid dynamics simulation model, train the server-level time-series proxy model; The server-level temporal neural network model takes as input time-series data the power of a specified component, the temperature and flow rate of the refrigerant at the server inlet, and the temperature and velocity of the air at the server inlet from time tn to time t, and outputs the server CPU / GPU temperature and the server outlet air temperature at time t+1.

4. The dynamic control method for a data center air-liquid co-cooling terminal under load fluctuations according to claim 3, characterized in that, The data center-level time-series proxy model and the server-level time-series proxy model both employ temporal convolutional networks.

5. The dynamic control method for data center air-liquid coordinated cooling terminal under load fluctuations according to claim 1, characterized in that, Step S4 is executed. Based on the integrated simulation environment and deep reinforcement learning framework, the deep reinforcement learning algorithm is used to train the dynamic control agent of the air-liquid co-cooling terminal. During the training process, the training convergence is judged by observing the change trend of the average round reward curve over time: when the curve remains stable, the fluctuation amplitude within a continuous preset time does not exceed the preset threshold, and there is no continuous upward trend, the agent is judged to have reached the convergence state.

6. The method for dynamic control of a data center air-liquid co-cooling terminal under load fluctuations according to claim 1, characterized in that, Execute step S5 to perform online iterative optimization of the time-series agent model and intelligent agent based on the actual collected data. During this process, real-time operating data output by sensors and control systems is continuously collected to build a historical time-series database. The data in the historical time series database is used for iterative optimization in the following two aspects: when the changes in the physical layout of servers and rack utilization rate in the data center exceed the preset threshold, iterative optimization is performed on the data center-level and server-level time series agent models; when the operating parameters of the data center remain within the preset threshold range for a continuous preset duration, the control agent is controlled to conduct direct online learning.