Method for applying attention mechanism and reinforcement learning to ai server heat dissipation

By combining attention mechanisms and reinforcement learning methods, an MLP attention network and a DDPG model are constructed to adaptively adjust fan speed, solving the problem that traditional server cooling methods cannot meet the high computing power requirements of AI servers, and achieving precise heat dissipation control of AI servers.

CN117234304BActive Publication Date: 2026-06-30POWERLEADER COMPUTER SYST CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
POWERLEADER COMPUTER SYST CO LTD
Filing Date
2023-09-15
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Traditional server cooling methods are insufficient to meet the high computing power requirements of AI servers, resulting in slow component temperature acquisition and untimely regulation, which may lead to the risk of server overheating.

Method used

By combining attention mechanisms and reinforcement learning, a multilayer perceptron (MLP) attention network and a DDPG model are constructed to adaptively adjust fan speed to precisely control the heat dissipation of the AI ​​server, and the BMC is used for real-time adjustment of fan speed.

Benefits of technology

It enables precise heat dissipation control of AI servers, improves the practicality and versatility of fan speed regulation, and avoids the risk of server overheating.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN117234304B_ABST
    Figure CN117234304B_ABST
Patent Text Reader

Abstract

This invention discloses a method for applying attention mechanisms and reinforcement learning to AI server heat dissipation. The method uses hardware temperature as input data and normalizes the input data. An attention network is constructed based on a multilayer perceptron, using GPU temperature as input for training, and outputting an attention weight vector 'a'. Weighted GPU temperature, CPU temperature, memory temperature, and hard drive temperature are used as input features for an Actor network, which then outputs a predicted fan speed. These same weighted temperatures are also used as input features for a Critic network, which outputs a state-action Q-value estimate. A DDPG model is embedded within the BMC (Browser Controlled Controller), and input data is periodically fed back to the DDPG model within the BMC. The DDPG network adaptively adjusts its strategy, and the BMC transmits the required fan speed control output to the fan via I2C to adjust the fan speed. This method enables more precise control of server fan heat dissipation, resulting in energy savings.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of server heat dissipation technology, and more specifically to a method that combines attention mechanisms and reinforcement learning for heat dissipation in AI servers. Background Technology

[0002] With the development of artificial intelligence (AI), domestic servers are gradually moving towards AI servers. However, the heat dissipation of AI servers has always been carried out in the same way as traditional servers. The heat dissipation of traditional servers is generally controlled by BMC (Baseboard Management Controller). The BMC periodically scans the motherboard information of the server and dynamically adjusts the fan speed according to the motherboard information.

[0003] However, AI servers differ from traditional servers. Besides key components like CPUs, memory, hard drives, and RAID cards, the GPUs and surrounding hardware (such as switching chips) on AI servers operate at higher frequencies, generating significantly more heat. This increased number of components slows down software access speeds and may lead to insufficient heat dissipation, causing the server to overheat.

[0004] Therefore, traditional server cooling methods are inadequate for high-performance AI servers with more complex hardware and functions. The more components there are, the slower the temperature acquisition speed. Complex devices (such as RAID controllers) require more programming time, and untimely control may pose a risk to the overall heat dissipation of the machine. Summary of the Invention

[0005] To overcome the shortcomings of existing technologies, this invention provides a method for applying attention mechanisms and reinforcement learning to heat dissipation in AI servers.

[0006] The technical solution adopted by this invention to solve its technical problem is:

[0007] A method combining attention mechanisms and reinforcement learning for AI server heat dissipation includes the following steps:

[0008] S1, the temperature of the hardware is used as input data, the hardware includes GPU, CPU, memory and hard disk, and the input data is normalized.

[0009] S2, an attention network is constructed based on a multilayer perceptron (MLP). The GPU temperature is used as input to train the fully connected MLP attention network and output the attention weight vector a. Then, the weighted GPU temperature is calculated as: weighted GPU temperature = Xgpu * a, where Xgpu is the GPU temperature data after normalization.

[0010] S3, Construct the DDPG model. The DDPG network consists of Actors and Critics. Under the Actor-Critic framework, perform the following steps: S31, Use weighted GPU temperature, CPU temperature, memory temperature, and hard disk temperature as input features of the Actor network, and then output the predicted value of the fan speed; S32, Use weighted GPU temperature, CPU temperature, memory temperature, hard disk temperature, and fan speed as input features of the Critic network, and then output the state-action Q-value estimate.

[0011] S4 embeds the DDPG model into the BMC and periodically feeds the input data back to the DDPG model in the BMC. The DDPG network adaptively adjusts the strategy, and the BMC transmits the fan speed output to be controlled to the fan via I2C to adjust the fan speed.

[0012] As a further improvement to the above technical solution, the normalization process scales the data features of hardware temperature and fan speed to a range of (0,1) from minimum to maximum.

[0013] As a further improvement to the above technical solution, the specific steps of the normalization process are as follows: Collect hardware temperature data at multiple time points, forming a hardware temperature dataset X = {x_1, x_2, ..., x_n}. Find the minimum value min_val and the maximum value max_val for each data feature, and apply the following formula for normalization:

[0014] x_i_normalized=(x_i-min_val) / (max_val-min_val);

[0015] Here, x_i is a vector of length m, representing the hardware temperature at the i-th time step.

[0016] As a further improvement to the above technical solution, the fan speed is also used as input data for normalization. The specific steps are as follows: Collect fan speed data at multiple time points, and the fan speed dataset is Y={y_1,y_2,...,y_n}. Find the minimum value min_val and the maximum value max_val for each data feature, and apply the following formula for normalization: y_i_normalized=(y_i-min_val) / (max_val-min_val), where y_i is a vector of length m, representing the fan speed at the i-th time step.

[0017] As a further improvement to the above technical solution, the attention network is trained using the forward propagation algorithm.

[0018] As a further improvement to the above technical solution, the training process of the attention network is as follows:

[0019] Calculate the output h1 of the first layer: h1 = ReLU(W1 * GPU temperature + b1);

[0020] Calculate the output h2 of the second layer: h2 = ReLU(W2*h1 + b2);

[0021] Calculate the attention weight vector: Attention weight vector a = Softmax(W3*h2+b3);

[0022] Output weighted GPU temperature;

[0023] Where W1 is the weight matrix of the first layer; b1 is the bias vector of the first layer; W2 is the weight matrix of the second layer; b2 is the bias vector of the second layer; W3 is the weight matrix of the attention weight vector; b3 is the bias vector of the attention weight vector; ReLU is a non-linear activation function; and the Softmax function is used to transform the output of the attention weight vector into a probability distribution.

[0024] As a further improvement to the above technical solution, the update process of the Actor network in the DDPG model update is as follows:

[0025] The input vector of the Actor network is constructed based on the input state: S = [weighted GPU temperature, CPU temperature, memory temperature, hard disk temperature];

[0026] The output value of the fan speed is predicted using an Actor network: A1 = π(S);

[0027] Calculate the gradient of the action: A= π(A1|S).

[0028] Update the parameters of the Actor network using the gradient ascent method: θ_π = θ_π + α * A1* θ_Q(S, A1), where, It is the gradient operator, and α is the learning rate.

[0029] As a further improvement to the above technical solution, the update process of the Critic network in the DDPG model update is as follows:

[0030] The input vector of the Critic network is constructed based on the input state and action: S = [weighted GPU temperature features, CPU temperature, memory temperature, hard disk temperature], A2 = fan speed;

[0031] Estimate the state-action Q-value using a Critic network: Q(S,A2) = Q(S,A2;θ_Q);

[0032] Calculate the loss between the predicted Q value and the target Q value of the Critic network: L=0.5*(Q(S,A2)-(R+γ*Q(S',π(S';θ_π);θ_Q_target)))^2, where R is the reward signal, γ is the discount factor, and S' is the next state;

[0033] Update the parameters of the Critic network using gradient descent: θ_Q = θ_Q - β* L, where β is the learning rate.

[0034] As a further improvement to the above technical solution, sensors are used to collect the temperature of the GPU, CPU, memory, and hard drive, as well as the fan speed.

[0035] The beneficial effects of this invention are: deploying the offline DDPG model into BMC, predicting the fan speed that needs to be controlled through the model, and since this model is a reinforcement learning model, it has the functions of self-regulation and feedback training, which can control the fan speed more accurately. It has stronger practicality and versatility for the attention mechanism of AI servers. Attached Figure Description

[0036] The present invention will be further described below with reference to the accompanying drawings and embodiments.

[0037] Figure 1 This is a network framework diagram of the method for applying attention mechanisms and reinforcement learning to heat dissipation of AI servers according to the present invention;

[0038] Figure 2 This is an algorithm flowchart of the method of applying attention mechanism and reinforcement learning to heat dissipation of AI servers according to the present invention. Detailed Implementation

[0039] The following will clearly and completely describe the concept, specific structure, and technical effects of the present invention in conjunction with embodiments and accompanying drawings, so as to fully understand the purpose, features, and effects of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, not all of them. Other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative effort are all within the scope of protection of the present invention. Furthermore, all connections / linkages involved in the patent do not simply refer to direct connection, but rather to the ability to form a better connection structure by adding or reducing connecting accessories according to specific implementation conditions. The various technical features in this invention can be combined interactively without contradicting each other.

[0040] Basic implementation examples refer to Figure 1 , Figure 2 The method of combining attention mechanisms and reinforcement learning for AI server heat dissipation includes the following steps:

[0041] S1, taking hardware temperature and fan speed as input data, the hardware includes GPU, CPU, memory, and hard drive, normalizes the input data (GPU temperature, CPU temperature, memory temperature, hard drive temperature, fan speed), scaling the GPU temperature, CPU temperature, memory temperature, hard drive temperature, and fan speed data to the range of (0,1) by minimum to maximum. The specific steps are as follows:

[0042] S11. Collect hardware temperature data samples at multiple time points. The hardware temperature dataset is X={x_1,x_2,...,x_n}. Find the minimum value min_val and the maximum value max_val for each data feature, and then apply the following formula for normalization: x_i_normalized=(x_i-min_val) / (max_val-min_val), where for each sample's feature vector x_i, x_i is a vector of length m, representing the temperature of each hardware component (including GPU temperature, CPU temperature, memory temperature, and hard drive temperature) at the i-th time step.

[0043] S12. Collect fan speed data samples at multiple time points. The fan speed dataset is Y={y_1,y_2,...,y_n}. Find the minimum value min_val and the maximum value max_val for each data feature. Apply the following formula for normalization: y_i_normalized=(y_i-min_val) / (max_val-min_val), where for each sample's feature vector y_i, y_i is a vector of length m, representing the fan speed at the i-th time step.

[0044] S2, an attention network is built based on a multilayer perceptron (MLP), using GPU temperature as input, and the forward propagation algorithm is used to train the fully connected MLP attention network. The training process is as follows:

[0045] Define model parameters:

[0046] W1: The weight matrix of the first layer (size n×m, where n is the number of neurons in the first layer and m is the length of the GPU temperature vector).

[0047] b1: The bias vector of the first layer (size n);

[0048] W2: The weight matrix of the second layer (size p×n, where p is the number of neurons in the second layer).

[0049] b2: The bias vector of the second layer (size p);

[0050] W3: The weight matrix of the attention weight vector (size q×p, where q is the length of the attention weight vector);

[0051] b3: The bias vector of the attention weight vector (size q);

[0052] Using the forward propagation algorithm:

[0053] Calculate the output h1 of the first layer: h1 = ReLU(W1 * GPU temperature + b1);

[0054] Calculate the output h2 of the second layer: h2 = ReLU(W2*h1 + b2);

[0055] Calculate the attention weight vector: Attention weight vector a = Softmax(W3*h2+b3);

[0056] In the formula, the ReLU function is an activation function that can introduce non-linear features, and the Softmax function is used to transform the output of the attention weight vector into a probability distribution.

[0057] Output the attention weight vector, and then calculate the weighted GPU temperature, which is calculated as Xgpu*a, where Xgpu is the normalized GPU temperature data.

[0058] S3, Construct the DDPG model. The DDPG network consists of Actors and Critics. Perform the following steps within the Actor-Critic framework:

[0059] S31, Actor Network Update: The weighted GPU temperature, CPU temperature, memory temperature, and hard drive temperature are used as input features of the Actor network, and then the predicted fan speed is output.

[0060] The update process of the Actor network is as follows:

[0061] The input vector of the Actor network is constructed based on the input state: S = [weighted GPU temperature, CPU temperature, memory temperature, hard disk temperature];

[0062] The output value of the fan speed is predicted using an Actor network: A1 = π(S);

[0063] Calculate the gradient of the action: A1= π(A1|S);

[0064] Update the parameters of the Actor network using the gradient ascent method: θ_π = θ_π + α * A1* θ_Q(S, A1), where It is the gradient operator, and α is the learning rate.

[0065] S32, Critic Network Update: Weighted GPU temperature, CPU temperature, memory temperature, hard drive temperature and fan speed are used as input features of the Critic network, and then the output is a state-action Q-value estimate;

[0066] The update process of the Critic network is as follows:

[0067] The input vector of the Critic network is constructed based on the input state and action: S = [weighted GPU temperature, CPU temperature, memory temperature, hard disk temperature], A2 = fan speed;

[0068] Estimate the state-action Q-value using a Critic network: Q(S,A2) = Q(S,A2;θ_Q);

[0069] Calculate the loss between the predicted Q value and the target Q value of the Critic network: L=0.5*(Q(S,A2)-(R+γ*Q(S',π(S';θ_π);θ_Q_target)))^2, where R is the reward signal, γ is the discount factor, and S' is the next state;

[0070] Update the parameters of the Critic network using gradient descent: θ_Q = θ_Q - β* L, where β is the learning rate.

[0071] Based on the aforementioned Actor network and Critic network, an offline DDPG model is obtained. This DDPG model is a reinforcement learning model that collects experience through interaction with the environment, and uses experience replay for DDPG training and optimization, thereby achieving self-regulation and feedback training functions. The DDPG model can then be used to predict the fan speed that needs to be controlled.

[0072] S4 embeds the offline DDPG model into the BMC and periodically feeds back the input data collected by the sensors (GPU temperature, CPU temperature, memory temperature, hard disk temperature, fan speed) to the DDPG model in the BMC. The DDPG network adaptively adjusts the strategy, and the BMC outputs the fan speed that needs to be controlled to the fan through the backend API and I2C to adjust the fan speed.

[0073] The above is a detailed description of the preferred embodiments of the present invention. However, the present invention is not limited to the embodiments described. Those skilled in the art can make various equivalent modifications or substitutions without departing from the spirit of the present invention. All such equivalent modifications or substitutions are included within the scope defined by the claims of this application.

Claims

1. A method for applying attention mechanisms and reinforcement learning to heat dissipation in AI servers, characterized in that, Includes the following steps: S1, the temperature of the hardware is used as input data, the hardware includes GPU, CPU, memory and hard disk, and the input data is normalized. S2, an attention network is constructed based on a multilayer perceptron (MLP). The GPU temperature is used as input to train the fully connected MLP attention network and output the attention weight vector a. Then, the weighted GPU temperature is calculated as: weighted GPU temperature = Xgpu * a, where Xgpu is the GPU temperature data after normalization. The training process of the attention network is as follows: Calculate the output h1 of the first layer: h1 = ReLU(W1 * GPU temperature + b1); Calculate the output h2 of the second layer: h2 = ReLU(W2*h1 + b2); Calculate the attention weight vector: Attention weight vector a = Softmax(W3*h2+b3); Where W1 is the weight matrix of the first layer; b1 is the bias vector of the first layer; W2 is the weight matrix of the second layer; b2 is the bias vector of the second layer; W3 is the weight matrix of the attention weight vector; b3 is the bias vector of the attention weight vector; ReLU is a non-linear activation function; and the Softmax function is used to transform the output of the attention weight vector into a probability distribution. S3, Construct the DDPG model. The DDPG network consists of Actors and Critics. Under the Actor-Critic framework, perform the following steps: S31, Use weighted GPU temperature, CPU temperature, memory temperature, and hard disk temperature as input features of the Actor network, and then output the predicted value of the fan speed; S32, Use weighted GPU temperature, CPU temperature, memory temperature, hard disk temperature, and fan speed as input features of the Critic network, and then output the state-action Q-value estimate. The update process of the Actor network in the DDPG model update is as follows: The input vector of the Actor network is constructed based on the input state: S = [weighted GPU temperature, CPU temperature, memory temperature, hard disk temperature]; The output value of the fan speed is predicted using an Actor network: A1 = π(S); Calculate the gradient of the action: A1= π(A1|S); Update the parameters of the Actor network using the gradient ascent method: θ_π = θ_π + α * A1* θ_Q(S, A1), where, It is the gradient operator, and α is the learning rate; The update process of the Critic network in the DDPG model update is as follows: The input vector of the Critic network is constructed based on the input state and action: S = [weighted GPU temperature, CPU temperature, memory temperature, hard disk temperature], A2 = fan speed; Estimate the state-action Q-value using a Critic network: Q(S,A2)=Q(S,A2;θ_Q); Calculate the loss between the predicted Q value and the target Q value of the Critic network: L=0.5*(Q(S,A2)-(R+γ*Q(S',π(S';θ_π);θ_Q_target)))^2, where R is the reward signal, γ is the discount factor, and S' is the next state; Update the parameters of the Critic network using gradient descent: θ_Q = θ_Q - β* L, where β is the learning rate; S4 embeds the DDPG model into the BMC and periodically feeds the input data back to the DDPG model in the BMC. The DDPG network adaptively adjusts the strategy, and the BMC transmits the fan speed output to be controlled to the fan via I2C to adjust the fan speed.

2. The method for applying attention mechanisms and reinforcement learning to AI server heat dissipation according to claim 1, characterized in that, The normalization process scales the hardware temperature data features to a minimum-maximum range of (0,1).

3. The method for applying attention mechanisms and reinforcement learning to AI server heat dissipation according to claim 2, characterized in that, The specific steps of the normalization process are as follows: collect hardware temperature data at multiple time points, with the hardware temperature dataset X={x_1,x_2,...,x_n}. Find the minimum value min_val and the maximum value max_val for each data feature, and apply the following formula for normalization: x_i_normalized=(x_i-min_val) / (max_val-min_val), where x_i is a vector of length m, representing the hardware temperature at the i-th time step.

4. The method for applying attention mechanisms and reinforcement learning to AI server heat dissipation according to claim 3, characterized in that, Fan speed is also included as input data for normalization. The specific steps are as follows: Collect fan speed data at multiple time points. The fan speed dataset is Y={y_1,y_2,...,y_n}. Find the minimum value min_val and the maximum value max_val for each data feature. Apply the following formula for normalization: y_i_normalized=(y_i-min_val) / (max_val-min_val), where y_i is a vector of length m, representing the fan speed at the i-th time step.

5. The method for applying attention mechanisms and reinforcement learning to AI server heat dissipation according to claim 4, characterized in that, The attention network is trained using the forward propagation algorithm.

6. The method for applying attention mechanisms and reinforcement learning to heat dissipation of AI servers according to any one of claims 1-5, characterized in that, Sensors are used to collect the temperature of the GPU, CPU, memory, and hard drive, as well as the fan speed.