A method and system for temperature control of a liquid-cooled server

By acquiring multi-source time-series data from liquid-cooled servers and comparing the average values ​​of previous and subsequent time periods with linear extrapolation, and by assessing the heat dissipation requirements in conjunction with the current temperature status, the pump and fan speeds are dynamically adjusted. This solves the passive reactive problem of traditional particle swarm optimization algorithms and achieves a balance between safety and energy efficiency.

CN122086217BActive Publication Date: 2026-06-30DONGGUAN LIMINDA ELECTRONIC TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
DONGGUAN LIMINDA ELECTRONIC TECH CO LTD
Filing Date
2026-04-22
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Traditional particle swarm optimization algorithms lack the ability to perceive and predict dynamic trends in liquid-cooled server temperature control, resulting in passive and reactive control strategies that affect control stability and energy efficiency.

Method used

By acquiring multi-source time-series data within a set time window, dividing the average values ​​of preceding and following time periods, linearly extrapolating load changes, and combining this with the current temperature status, a forward-looking and responsive assessment of heat dissipation needs is formed, and the pump and fan speeds are dynamically adjusted to optimize energy consumption.

Benefits of technology

It enables timely response to risky operating conditions such as sudden load increases and temperature approaches, ensuring a balance between safety and energy efficiency, and reducing the energy consumption of the liquid cooling system.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122086217B_ABST
    Figure CN122086217B_ABST
Patent Text Reader

Abstract

This invention relates to the field of temperature control for liquid-cooled servers, specifically to a method and system for temperature control of liquid-cooled servers. The method collects multi-source time-series data, including the temperature and load of core server components, in real time. It compares the load sequence within a time window with previous and subsequent periods to identify load change trends, and then linearly extrapolates the expected heat generation power for the next control cycle, forming a forward-looking heat dissipation demand assessment. Simultaneously, the system independently generates a responsive heat dissipation demand assessment based on the proportion of the current highest chip temperature within a safe threshold. The larger value of the two assessment paths is taken as the lower limit of heat dissipation power, and this is inversely analyzed to determine the minimum speed of the water pump and fan, thereby dynamically shrinking the search space of the optimization algorithm. Finally, within this dynamically adjusted safety boundary, a particle swarm optimization algorithm is used to find the combination of operating parameters with the lowest power consumption, achieving precise control that balances heat dissipation safety and energy saving.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of temperature control for liquid-cooled servers, and more specifically to a temperature control method and system for liquid-cooled servers. Background Technology

[0002] Servers, especially data center servers used for high-performance computing and artificial intelligence training, generate enormous amounts of heat during operation, particularly in their core processors (CPUs) and graphics processing units (GPUs). Traditional air cooling methods are increasingly inadequate for high-power-density scenarios. Liquid cooling technology, due to its superior heat transfer efficiency and heat dissipation capabilities, has become a key technology for cooling modern high-performance servers. A liquid cooling system typically includes cold plates, coolant, circulating pumps, piping, and heat exchange units. By controlling the speed of the circulating pump to regulate the coolant flow rate, or by controlling the speed of the fans in the heat exchange units to adjust the final heat dissipation efficiency, precise control of the server's internal temperature can be achieved.

[0003] To minimize the energy consumption of a liquid cooling system while meeting heat dissipation requirements, existing technologies often employ intelligent optimization algorithms to find the optimal combination of control parameters. Among these, particle swarm optimization (PSO) is frequently used due to its simplicity and strong global search capability. Its basic idea is to treat different combinations of control parameters as "particles" in the solution space, and through iterative search, find the optimal "particle" as the control command that minimizes the total system energy consumption and ensures that the temperature of all chips does not exceed a safe threshold.

[0004] However, traditional particle swarm optimization algorithms have significant drawbacks when applied to temperature control of liquid-cooled servers. Specifically, in each iteration, the algorithm evaluates the merits of each particle based solely on the current system state, such as the current chip temperature and server power consumption. This evaluation method ignores the historical evolution of system state variables and potential future trends. For example, the algorithm cannot distinguish between two distinct system states: "temperature stable at 75°C" and "temperature rapidly rising from 65°C to 75°C in the past few seconds," because it can only read the instantaneous 75°C reading. This mechanism results in control decisions that are essentially purely reactive, only adjusting passively after the temperature has risen and a problem has occurred. It lacks the ability to perceive and predict dynamic trends in the system, thus failing to provide proactive intervention and impacting control stability and energy efficiency. Summary of the Invention

[0005] To address the lack of perception and prediction capabilities for dynamic trends in traditional particle swarm optimization algorithms, this invention proposes a temperature control method for a liquid-cooled server. The method includes acquiring multi-source time-series data within a set time window, including the temperature of key heat sources and the server load; dividing the load sequence within the time window into a preceding and following segment along the time axis, calculating their respective mean values, and using the difference between the following and preceding segment mean values ​​as the load change; based on the current load value and the load change, linearly extrapolating the load at the end of the next control cycle to obtain the expected load value, and converting the expected load value into the expected heat generation power according to a pre-defined correspondence between load and heat generation power; and through... Two parallel paths determine the lower limit of heat dissipation power: the first path uses the expected heat generation power as a forward-looking heat dissipation requirement; the second path uses the ratio of the highest temperature of the current critical heat source to a preset safety threshold, and performs linear interpolation between the preset minimum and maximum heat dissipation power to obtain the responsive heat dissipation requirement; the larger value of the two path results is taken as the lower limit of heat dissipation power; based on the lower limit of heat dissipation power and the preset system performance model, the lower limit of the dynamic operating speed of the water pump and fan in the current control cycle is determined; within the search space formed by the lower limit of the dynamic operating speed and the set upper limit of the speed, the particle swarm optimization algorithm is executed to determine the control speed of the water pump and fan with the goal of minimizing the total power consumption of the liquid cooling system.

[0006] Compared to existing passive control strategies that primarily rely on current temperature for feedback adjustment, this invention identifies load change trends by comparing the load sequence before and after periods within a time window. Based on this, it linearly extrapolates future heat generation power, forming a forward-looking judgment on upcoming heat load changes. Simultaneously, the system independently assesses responsive heat dissipation requirements from the current temperature state. Two evaluation paths run in parallel, and the larger value is taken to ensure that the system can make appropriate decisions regarding the lower limit of heat dissipation power, whether the temperature is still low but the load is rapidly increasing or the load is stable but the temperature is approaching the safety boundary. Based on this, a particle swarm optimization algorithm is used to find the operating point with the lowest power consumption above the safety boundary, ultimately achieving the goal of minimizing the energy consumption of the liquid cooling system while ensuring safety.

[0007] Dividing the load sequence within the time window into a first segment and a second segment along the time axis, further includes: taking the midpoint of the time window as the boundary, calculating the arithmetic mean of the smoothed load values ​​of the sampling points in the first half to obtain the mean of the first segment, and calculating the arithmetic mean of the smoothed load values ​​of the sampling points in the second half to obtain the mean of the second segment.

[0008] Comparing the means of the first and second halves of the window eliminates the need for additional empirical parameters such as attenuation coefficients. The calculation process is simple and the physical meaning is intuitive: a higher mean in the second half indicates that the load has been increasing recently, and a larger difference indicates a faster increase. This approach is a standard first-order change detection method in time series analysis, reliably capturing the short-term direction and magnitude of load changes.

[0009] Furthermore, it also includes truncating the expected load value so that it is not lower than the server idle power consumption value and not higher than the server's rated full-load power consumption.

[0010] The nonlinear adjustment mechanism of this invention enables the risk index to more accurately reflect the potential overheating risk level of the system. When the load increases sharply, the risk index will increase rapidly, thereby triggering a more timely heat dissipation response and improving the system's robustness in dealing with sudden high load conditions.

[0011] Furthermore, in the second path, when the ratio of the highest temperature to the safety threshold is lower than a preset lower limit ratio, the responsive heat dissipation requirement is the minimum heat dissipation power; when the ratio is higher than a preset upper limit ratio, the responsive heat dissipation requirement is the maximum heat dissipation power.

[0012] By setting a cutoff interval with upper and lower limits, the system does not impose unnecessary heat dissipation constraints when the temperature is far below the safety threshold, and forces the system to enter the maximum heat dissipation state when the temperature is extremely close to the safety threshold. The cutoff behavior at both ends ensures that the responsive heat dissipation demand will not create a blind zone with zero constraints, nor will it exceed the physical heat dissipation capacity of the system due to interpolation calculations.

[0013] Furthermore, determining the lower limit of the dynamic operating speed of the water pump and fan within the current control cycle also includes: based on the system performance model, establishing a coordinated operation constraint that maintains a fixed ratio between the normalized speeds of the water pump and fan, solving for the unique speed combination that satisfies the total heat dissipation power equal to the lower limit of the heat dissipation power, and using this combination as the lower limit of the dynamic operating speed of the water pump and fan.

[0014] Furthermore, after obtaining the temperature of the key heat source and the load of the server, the process also includes performing a moving average filter on the temperature and load to obtain smoothed temperature and load data.

[0015] Compared to using raw sensor data directly, moving average filtering effectively removes high-frequency noise and transient spikes from temperature and load data. This provides a smoother and more stable data foundation for subsequent trend assessment and heat dissipation requirement calculation, preventing unnecessary misjudgments and frequent adjustments to the control system due to data noise.

[0016] In a second aspect, the present invention provides a temperature control system for a liquid-cooled server, comprising: a data acquisition unit for acquiring multi-source time-series data within a set time window, including the temperature of key heat sources and the server load; a load trend assessment unit connected to the data acquisition unit for dividing the load sequence within the time window into a preceding segment and a following segment and calculating the average value of each segment, using the difference between the average value of the following segment and the average value of the preceding segment as the load change, linearly extrapolating the load for the next control cycle based on the current load and the load change, and converting the expected load value into the expected heat generation power according to a pre-calibrated correspondence between load and heat generation power; and a heat dissipation demand assessment unit connected to the data acquisition unit and the load trend assessment unit, comprising a forward-looking assessment path and a responsive assessment path, wherein the forward-looking assessment path uses the expected heat generation power... As a forward-looking heat dissipation requirement, the responsive evaluation path obtains the responsive heat dissipation requirement by linear interpolation based on the ratio of the highest temperature of the key heat source to the safety threshold at the current moment, and takes the larger value of the two path results as the lower limit of heat dissipation power; Safety boundary decision unit: connected to the heat dissipation requirement evaluation unit, used to determine the lower limit of dynamic operating speed of water pump and fan in the current control cycle based on the lower limit of heat dissipation power and the preset system performance model; Energy consumption optimization unit: connected to the safety boundary decision unit, used to execute particle swarm optimization algorithm in the search space composed of the lower limit of dynamic operating speed and the set upper limit of speed, with the goal of minimizing the total power consumption of the liquid cooling system, to determine the control speed of water pump and fan; Control execution unit: connected to the energy consumption optimization unit, used to receive and execute the control speed of water pump and fan.

[0017] Furthermore, the system also includes: a data preprocessing unit: located between the data acquisition unit and the load trend assessment unit, used to perform moving average filtering on the temperature of the key heat source and the load of the server to obtain smoothed temperature data and load data.

[0018] Furthermore, the load trend assessment unit is configured to divide the time window into a pre-segment and a post-segment, with the midpoint of the time window as the dividing line, and to truncate the expected load value so that it is not lower than the server idle power consumption value and not higher than the server rated full-load power consumption value.

[0019] Furthermore, the responsiveness evaluation path of the heat dissipation demand evaluation unit is configured as follows:

[0020] When the ratio of the maximum temperature to the safety threshold is lower than the preset lower limit ratio, the minimum heat dissipation power is used; when it is higher than the preset upper limit ratio, the maximum heat dissipation power is used; when it is in between, the responsive heat dissipation requirement is determined by linear interpolation.

[0021] The technical effects of this invention are as follows:

[0022] This invention abandons the traditional passive control mode that relies solely on the current temperature. It directly predicts future heat generation power through comparison of the first and second half-windows and linear extrapolation, and independently assesses the heat dissipation margin requirement from the current temperature state. The mechanism of taking the larger value from two paths ensures that the system can respond promptly to two typical risk conditions: load surge and temperature approach. The determined lower limit of heat dissipation power dynamically shrinks the search space of the particle swarm optimization algorithm, allowing the algorithm to focus on minimizing power consumption within a safe region. This method can intervene in advance before load surges and release energy-saving space in a timely manner when the load is stable or decreasing, achieving a balance between safety, responsiveness, and energy efficiency. Attached Figure Description

[0023] Figure 1 This is a schematic flowchart illustrating a temperature control method for a liquid-cooled server according to an embodiment of the present invention;

[0024] Figure 2 This is a schematic diagram illustrating how the effective search space of the water pump changes in real time with the dynamic lower bound in an embodiment of the present invention;

[0025] Figure 3 This is a schematic diagram illustrating the power consumption comparison between the improved PSO of the present invention and the traditional PSO;

[0026] Figure 4 This is a schematic block diagram illustrating the structure of a temperature control system for a liquid-cooled server according to an embodiment of the present invention. Detailed Implementation

[0027] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0028] The specific embodiments of the present invention will now be described in detail with reference to the accompanying drawings.

[0029] An embodiment of a temperature control method for a liquid-cooled server:

[0030] like Figure 1 As shown, a temperature control method for a liquid-cooled server according to the present invention includes:

[0031] S1. Through the out-of-band management interface, collect multi-source time-series data such as core component temperature and server load within a preset time window, and use the moving average filtering method to eliminate noise and obtain a smoothed data sequence.

[0032] In this embodiment, the control system operates at a preset control cycle (e.g., (seconds) of operation. At the beginning of each control cycle. The system manages the server through out-of-band interfaces, such as the Intelligent Platform Management Interface (IPMI). (or redfish) Get and store a specific time window Multi-source time-series data within.

[0033] As a preferred embodiment, the time window The length can be set to Instant Seconds. If If the data sequence is too short, it may contain too much noise and fail to reflect the true trend; if... If the time is too long, the system's response to state changes will be delayed. In this embodiment, a preferred time is selected. The acquired data sequence specifically includes:

[0034] Core component temperature sequence : Obtain all critical heat sources within the server, such as the central processing unit (CPU) ) and graphics processor ( ),in the past Real-time temperature readings over a given time period. This sequence can be denoted as... ,in Indicates the total number of critical heat sources. Indicates the data sampling time interval, and has ,in Indicates time window The maximum index number of the sequence is obtained by subtracting one from the total number of sampling points within the sequence.

[0035] Server load sequence: The control system sends IPMI or Redfish query commands to the server's BMC. The BMC continuously monitors the server's PSU. The BMC can read the real-time power consumption of the entire server from the PSU's sensors and return it to the control system, thereby obtaining real-time load rate or actual power consumption data that characterizes the server's computing intensity.

[0036] Liquid cooling system operating parameter sequence: Pumps and fans typically report their current speed to their controllers via their Tachometer (TACH) signal lines. The control system sends query commands to the CDU controller or pump / fan controller via industrial bus protocols such as PMBus, I2C, Modbus, or RS485 serial port. The controller then returns the current pump speed and fan speed of the heat exchange unit in the liquid cooling system.

[0037] To eliminate potential transient noise interference in the raw data and improve the accuracy of subsequent trend analysis, preprocessing of the acquired temperature and load sequences is necessary. Specifically, a moving average filtering method can be used to smooth the data, resulting in smoothed temperature and load sequences.

[0038] S2. Based on the smoothed historical load sequence, the load change trend is identified by comparing the average values ​​of the preceding and following periods, and the expected load and expected heat generation power for the next control cycle are linearly extrapolated.

[0039] To enable the control algorithm to sense the changing trend of server heat generation, rather than simply responding to the current instantaneous load value, this embodiment evaluates the short-term change direction and magnitude of server load based on historical load sequences.

[0040] Specifically, the smoothed load sequence within the time window obtained in step S1 is divided into a first half and a second half, with the midpoint of the time window as the boundary. The arithmetic mean of all smoothed load samples in the first half and the second half is calculated respectively to obtain the mean of the first half and the mean of the second half. The load change is obtained by subtracting the mean of the first half from the mean of the second half.

[0041] A positive load change indicates that the server load is trending upwards recently, meaning that heat generation may increase further in the near future. A negative load change indicates that the load is decreasing, and heat generation is expected to decrease. When the load change is close to zero, it indicates that the load is in a stable state.

[0042] Based on this, and using the latest load value at the current moment and the aforementioned load changes, the expected load value is obtained by extending the time span of one control cycle forward along a linear trend. To prevent linear extrapolation from producing results exceeding physical limits during drastic load changes, the expected load value is truncated: it is made no lower than the server's idle power consumption and no higher than the server's rated full-load power consumption. These two truncation boundaries can be directly obtained from the server equipment specifications.

[0043] During server operation, the vast majority of the electrical energy consumed is converted into heat. Therefore, the heat generation corresponding to the expected load value can be directly obtained from a load-power consumption table. This table can be pre-calibrated during system deployment through tiered load testing: the server load is sequentially set to several levels, such as idle, 25%, 50%, 75%, and full load. After stable operation at each level, the total power consumption is recorded, forming a load-power consumption table. For intermediate load values ​​not covered in the table, linear interpolation is used to obtain the corresponding power consumption.

[0044] S3. By performing parallel calculations of the forward-looking evaluation path and the responsive evaluation path, determine the lower limit of the heat dissipation power for the current control cycle.

[0045] Relying solely on projected heat generation to determine cooling requirements is insufficient in certain situations. For example, when server loads operate at moderate levels for extended periods with a stable trend, projected heat generation may not change significantly. However, if chip temperatures are gradually approaching safe thresholds due to increased coolant temperature or other environmental factors, the system still needs to improve its cooling capacity. Conversely, when the load is rapidly increasing but the current temperature is still relatively low, temperature-based assessments may not yet trigger warnings, but forward-looking heat generation projections may already signal an increase in cooling demand.

[0046] To simultaneously cover the two typical risk scenarios mentioned above, this embodiment designs two parallel heat dissipation requirement assessment paths:

[0047] The first path is the forward-looking assessment path. This path directly uses the expected heat generation power calculated in step S2 as the forward-looking heat dissipation demand value. Its engineering meaning is: the heat dissipation power of the liquid cooling system should at least match the expected upcoming heat generation power to prevent the temperature from continuing to rise due to insufficient heat dissipation capacity.

[0048] The second path is the responsive assessment path. This path assesses the heat dissipation requirement starting from the current temperature state. Specifically, it takes the highest smoothed temperature among all critical heat sources at the current moment and calculates its ratio to a preset safe temperature threshold, called the temperature proportion. Based on this temperature proportion, a linear interpolation is performed between the system's minimum and maximum heat dissipation power to obtain the responsive heat dissipation requirement value.

[0049] The specific rules for linear interpolation are as follows: When the temperature percentage is lower than the preset lower limit, it indicates that the current temperature is far below the safety threshold, and the system is in a state with sufficient safety margin. At this time, the responsive cooling requirement is taken as the minimum cooling power of the system, and no additional cooling constraints are imposed. When the temperature percentage is higher than the preset upper limit, it indicates that the current temperature is extremely close to the safety threshold. At this time, the responsive cooling requirement is taken as the maximum cooling power of the system, requiring the liquid cooling system to operate at full load. When the temperature percentage is between the lower limit and the upper limit, the responsive cooling requirement is obtained by linear interpolation between the minimum cooling power and the maximum cooling power according to the position of the temperature percentage.

[0050] In this embodiment, the lower limit ratio can be set to 0.7, and the upper limit ratio can be set to 0.95. These two ratio values ​​can be calibrated once during deployment based on the thermal characteristics of the specific chip model and the system's thermal response time constant. The calibration method is as follows: with the liquid cooling system operating at minimum power, gradually increase the server load, observe the temperature range experienced by the chip as it accelerates from the initial rise to reaching the safety threshold, set the temperature percentage corresponding to the starting point of the accelerated rise as the lower limit ratio, and set the temperature percentage when it is less than five degrees below the safety threshold as the upper limit ratio. The minimum and maximum heat dissipation power can be determined by consulting the liquid cooling system's equipment manual or parameter nameplate.

[0051] After completing the independent calculations for the two paths, the larger of the two values ​​is taken as the lower limit of the heat dissipation power for the current control cycle. This logic of taking the larger value ensures that the system's heat dissipation decision always targets the more severe of the two types of risks: when the load rises sharply but the temperature is still low, the first path dominates the heat dissipation decision; when the load is stable but the temperature is approaching the threshold, the second path dominates the heat dissipation decision; when both risks exist simultaneously, the lower limit of heat dissipation power is naturally pushed to the higher level required by both.

[0052] S4. Based on the pre-calibrated system performance model and the joint operation strategy, the lower limit of heat dissipation power is reverse-analyzed into a combination of the lower limits of dynamic operating speeds that the water pump and fan must meet.

[0053] In step S3, the lower limit of the heat dissipation requirement was obtained. The optimization variables of the PSO algorithm are the specific actuator parameters, namely the water pump speed and the fan speed. Therefore, this step needs to reverse-analyze the lower limit of the heat dissipation requirement into the minimum operating conditions that the water pump and fan must meet, thereby forming a dynamic lower bound of the search space.

[0054] The total heat dissipation power of a liquid cooling system is mainly determined by the coolant flow rate, which is related to the pump speed, and the ambient heat exchange efficiency, which is related to the fan speed. This relationship can be described by a pre-calibrated system performance model.

[0055] When performing system control, it is necessary to find a set of parameter combinations that satisfy the condition that the total heat dissipation power is greater than or equal to the lower limit of the heat dissipation requirement. To determine a unique lower bound, a cooperative operation strategy can be introduced in this embodiment, for example, keeping the normalized speed (current speed / maximum speed) of the water pump and fan at a fixed ratio;

[0056] Solving the system of equations simultaneously with the above-mentioned cooperative constraint equations yields a unique solution, which is the lower bound of the dynamic search space at the current moment.

[0057] Thus, the search space of the PSO algorithm dynamically shrinks from a fixed speed range to an effective search space constrained by a lower bound vector.

[0058] S5. Within the safe search space consisting of the dynamic lower speed limit and the fixed upper speed limit, the particle swarm optimization algorithm is executed to iteratively optimize the system by using the minimum total power consumption as the fitness function, and the optimal control command is finally output.

[0059] like Figure 2 As shown in the figure, this diagram illustrates the search space process of the dynamic shrinkage control algorithm. The solid line represents the minimum pump speed requirement calculated in real time in step S3; the dashed line represents the upper limit of the pump's physical speed (8000 RPM); the filled area is the effective search space. For example, before 00:18, due to the high lower limit of heat dissipation requirements, this area is significantly compressed, and the PSO algorithm in S5 is prohibited from operating in the low-speed region to ensure heat dissipation safety.

[0060] In step S4, all solutions that do not meet the safety requirements have been physically removed from the search space. Therefore, the PSO algorithm can now perform its core task: finding a unique, energy-optimal target within the now-safe region. At this point, the algorithm's fitness function is simplified, with its sole objective being to minimize the total operating power consumption of the liquid cooling system.

[0061] The execution flow of the algorithm is as follows:

[0062] First, at the beginning of each control cycle, the system executes steps S1 to S4 to calculate the lower bound of the dynamic search space at the current moment. Then, it initializes or updates the particle swarm to ensure that the positions (i.e., rotational speed parameters) of all particles are within the new effective space. For particles that are outside the space, their positions are pulled back to the new boundary. Next, the PSO algorithm begins to iterate. In each iteration, the evaluation criterion is the fitness function value of each particle, i.e., the total power consumption. The algorithm further guides the swarm towards the corner with the lowest power consumption within the safe area through standard particle velocity and position update formulas. Finally, after the iteration ends, the globally optimal particle is found, and its corresponding control parameters become the final output command for the current control cycle, which is sent to the water pump and fan controllers for execution.

[0063] Through the above steps, the present invention determines the following control scheme. First, a safe operating boundary is defined proactively by sensing dynamic trends. Then, within this boundary, the powerful optimization capability of the PSO algorithm is used to find the most energy-efficient operating point.

[0064] like Figure 3As shown, this illustrates a comparison of the final control performance of the method of this invention and the traditional reactive method within the same simulation cycle. The ordinary PSO algorithm performs reactive control based solely on the current temperature, and its power consumption frequently reaches its maximum value during periods of drastic load and temperature fluctuations, resulting in significant energy waste. In contrast, the method of this invention proactively assesses risks and intelligently reduces power consumption at many high-temperature and negatively trending moments (such as after 00:18), thereby significantly saving energy while ensuring safety. The filled area visually demonstrates the energy saved by the method of this invention compared to the traditional method.

[0065] An embodiment of a temperature control system for a liquid-cooled server:

[0066] On the other hand, the present invention also provides a temperature control system for a liquid-cooled server. For example... Figure 4 As shown, the system proactively assesses heat dissipation needs by sensing the changing trends of server load and combining them with the current temperature status. Based on this, it dynamically adjusts the safety boundary of the optimization algorithm, ultimately minimizing the energy consumption of the liquid cooling system while ensuring heat dissipation safety.

[0067] The system includes the following interconnected and collaborative units:

[0068] Data Acquisition Unit: Responsible for acquiring multi-source time-series data within a specific time window at the start of each preset control cycle through the server's out-of-band management interface. This data includes at least: Core component temperature sequence: real-time temperature readings from key heat sources such as the CPU and GPU; Server load sequence: real-time load rate or power consumption data characterizing computing intensity; Liquid cooling system operating parameter sequence: current water pump speed and fan speed.

[0069] Data preprocessing unit: Connected to the data acquisition unit, it receives the original temperature sequence and load sequence, performs moving average filtering on them, and obtains smoothed temperature and load data to eliminate instantaneous noise interference.

[0070] Load Trend Assessment Unit: Connected to the data preprocessing unit, this unit divides the smoothed load sequence within a time window into a first half and a second half, using the midpoint as the boundary. It calculates the mean for each half, and the difference between the second and first half's mean is used as the load change. Further, based on the current load value and the load change, this unit linearly extrapolates the load for the next control cycle to obtain the expected load value. This value is then truncated to ensure it does not exceed the server's physical operating range. Finally, based on a pre-defined load-power consumption relationship, the expected load value is converted into the expected heat generation power.

[0071] The heat dissipation demand assessment unit connects to the data preprocessing unit and the load trend assessment unit, and contains two parallel assessment paths. The forward-looking assessment path directly receives the expected heat generation power output from the load trend assessment unit as the forward-looking heat dissipation demand. The responsive assessment path obtains the current highest smoothed temperature from the data preprocessing unit, calculates its ratio to a safety threshold, and performs linear interpolation between the minimum and maximum heat dissipation power based on this ratio to obtain the responsive heat dissipation demand. This unit takes the larger value of the two path results as the lower limit of heat dissipation power for the current control cycle.

[0072] Safety boundary decision unit: connected to the heat dissipation demand assessment unit, used to reverse analyze the lower limit of the dynamic operating speed of the water pump and fan based on the lower limit of heat dissipation power, the pre-calibrated system performance model, and the joint operation constraints.

[0073] Energy Consumption Optimization Unit: Connected to the safety boundary decision unit, this unit executes the particle swarm optimization algorithm within a safe search space defined by the dynamic operating speed lower limit and the set speed upper limit. The fitness function of this unit minimizes the total operating power consumption of the liquid cooling system, and after iteration, it outputs a set of globally optimal control parameters.

[0074] Control execution unit: Connected to the energy consumption optimization unit, it receives the optimal control parameters output by the unit and sends instructions to the physical equipment for execution through the water pump controller and fan controller to complete the closed-loop control of the current control cycle.

[0075] Here is a simple workflow example:

[0076] When the temperature control system is running: the data acquisition unit first continuously acquires real-time temperature and load data of the server through the IPMI interface; then the data preprocessing unit filters the data; next, the load trend assessment unit identifies the load change trend by comparing the front and back half windows and linearly extrapolates the expected heat generation power. Assuming the server load has been rising rapidly and continuously recently, with the latter half's average significantly higher than the former half's average, and the load change being a large positive value, the linearly extrapolated expected heat generation power is much higher than the current actual heat generation power. Simultaneously, the responsive path of the heat dissipation demand assessment unit determines that the current temperature percentage is still relatively low. After the two paths are completed in parallel, the larger value is taken. At this point, the forward-looking path dominates, outputting a higher lower limit for heat dissipation power. The safety boundary decision unit reverse-interprets this lower limit into a set of higher minimum speed lower bounds. Subsequently, the energy consumption optimization unit executes the PSO algorithm within the contracted speed safety space to find the operating point with the lowest power consumption in this high-speed region. Finally, the control execution unit receives the instruction and drives the water pump and fan to run, thereby enhancing heat dissipation capacity before the actual load causes a temperature rise.

Claims

1. A temperature control method of a liquid-cooled server, characterized by, The method includes: acquiring multi-source time-series data within a set time window, including the temperature of key heat sources and the load of the server; The load sequence within the time window is divided into a first segment and a second segment along the time axis, and their mean values ​​are calculated separately. The difference between the mean value of the second segment and the mean value of the first segment is used as the load change. Based on the load value at the current moment and the load change, the load at the end of the next control cycle is linearly extrapolated to obtain the expected load value. According to the pre-calibrated correspondence between load and heat production power, the expected load value is converted into the expected heat production power. The lower limit of heat dissipation power is determined by two parallel paths: the first path uses the expected heat generation power as a forward-looking heat dissipation requirement, which means that the heat dissipation power of the liquid cooling system can at least match the expected heat generation power; the second path is a responsive evaluation path, which evaluates the heat dissipation requirement based on the current temperature state; the larger value of the two path results is taken as the lower limit of heat dissipation power. In the second path, the ratio of the highest temperature among all critical heat sources at the current moment to the preset safe temperature threshold is calculated, which is called the temperature ratio. Based on the temperature ratio, a linear interpolation is performed between the preset minimum heat dissipation power and the maximum heat dissipation power to obtain the responsive heat dissipation demand value. The linear interpolation rule is as follows: when the temperature percentage is lower than the preset lower limit, the responsive heat dissipation demand value is the minimum heat dissipation power; when the temperature percentage is higher than the preset upper limit, the responsive heat dissipation demand value is the maximum heat dissipation power. Based on the preset system performance model and the joint collaborative operation strategy, the lower limit of heat dissipation power is reverse-analyzed into a combination of the lower limits of the dynamic operating speed of the water pump and the fan. Within the search space comprised of the lower limit of the dynamic operating speed and the set upper limit of the speed, a particle swarm optimization algorithm is executed to determine the control speeds of the water pump and the fan with the goal of minimizing the total power consumption of the liquid cooling system. 2.The temperature control method of a liquid-cooled server of claim 1, wherein, The load sequence within the time window is divided into a first segment and a second segment along the time axis, specifically as follows: Using the midpoint of the time window as the dividing line, the arithmetic mean of the smoothed load values ​​of the sampling points in the first half is calculated to obtain the mean of the first half, and the arithmetic mean of the smoothed load values ​​of the sampling points in the second half is calculated to obtain the mean of the second half.

3. The temperature control method for a liquid-cooled server according to claim 1, characterized in that, It also includes truncating the expected load value so that it is not lower than the server idle power consumption value and not higher than the server rated full load power consumption value.

4. The temperature control method for a liquid-cooled server according to claim 1, characterized in that, Based on a pre-defined system performance model and a collaborative operation strategy, the lower limit of heat dissipation power is inversely analyzed into a combination of the lower limits of the dynamic operating speeds of the water pump and the fan, including: Based on the system performance model, the coordinated operation constraint of maintaining a fixed ratio of normalized speeds of the water pump and fan is determined. The unique combination of speeds that satisfies the condition that the total heat dissipation power is equal to the lower limit of the heat dissipation power is then used as the lower limit of the dynamic operating speed of the water pump and fan.

5. The temperature control method for a liquid-cooled server according to claim 1, characterized in that, After obtaining the temperature of the key heat source and the load of the server, the process further includes performing a moving average filter on the temperature and load to obtain smoothed temperature and load data.

6. A temperature control system for a liquid-cooled server, characterized in that, The system includes: Data acquisition unit: used to acquire multi-source time-series data within a set time window, including the temperature of key heat sources and the load of the server; Load trend assessment unit: connected to the data acquisition unit, used to divide the load sequence within the time window into the first segment and the second segment and calculate the average value of each segment. The difference between the average value of the second segment and the average value of the first segment is used as the load change. Based on the current load and the load change, the load of the next control cycle is linearly extrapolated. The expected load value is converted into the expected heat output value according to the pre-calibrated correspondence between load and heat output power. Heat dissipation demand assessment unit: connected to the data acquisition unit and load trend assessment unit, including a forward-looking assessment path and a responsive assessment path; the forward-looking assessment path uses the expected heat generation power as the forward-looking heat dissipation demand, meaning that the heat dissipation power of the liquid cooling system can at least match the expected heat generation power; the responsive assessment path assesses the heat dissipation demand based on the current temperature state; the larger value of the two path results is taken as the lower limit of heat dissipation power. In the responsiveness assessment path, the ratio of the highest temperature among all critical heat sources at the current moment to the preset safe temperature threshold is calculated, which is called the temperature ratio. Based on the temperature ratio, a linear interpolation is performed between the preset minimum heat dissipation power and the maximum heat dissipation power to obtain the responsive heat dissipation demand value. The linear interpolation rule is as follows: when the temperature percentage is lower than the preset lower limit, the responsive heat dissipation demand value is the minimum heat dissipation power; when the temperature percentage is higher than the preset upper limit, the responsive heat dissipation demand value is the maximum heat dissipation power. Safety boundary decision unit: connected to the heat dissipation demand assessment unit, used to reverse analyze the lower limit of heat dissipation power into a combination of the lower limit of dynamic operating speed of water pump and fan based on the preset system performance model and the joint operation strategy; Energy consumption optimization unit: connected to the safety boundary decision unit, used to execute the particle swarm optimization algorithm in the search space formed by the lower limit of the dynamic operating speed and the set upper limit of the speed, with the goal of minimizing the total power consumption of the liquid cooling system, to determine the control speed of the water pump and the fan; Control execution unit: connected to the energy consumption optimization unit, used to receive and execute the control speed of the water pump and fan.

7. The temperature control system for a liquid-cooled server according to claim 6, characterized in that, The system also includes: Data preprocessing unit: Located between the data acquisition unit and the load trend assessment unit, it is used to perform moving average filtering on the temperature of the key heat source and the load of the server to obtain smoothed temperature data and load data.

8. The temperature control system for a liquid-cooled server according to claim 6, characterized in that, The load trend assessment unit is configured to divide the time window into a front segment and a back segment, with the midpoint of the time window as the boundary, and to truncate the expected load value so that it is not lower than the server idle power consumption value and not higher than the server rated full load power consumption value.