A multi-type computing power cluster power load prediction method and system for low-carbon operation

By using a prediction method customized according to computing power type, combined with convolutional neural networks and elastic gating fusion mechanism, the adaptability and robustness issues of data center computing power load prediction are solved, achieving accurate prediction of multi-type computing power load and providing reliable data support for low-carbon operation.

CN122203192APending Publication Date: 2026-06-12STATE GRID SHANGHAI MUNICIPAL ELECTRIC POWER CO +2

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
STATE GRID SHANGHAI MUNICIPAL ELECTRIC POWER CO
Filing Date
2026-01-28
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Existing technologies fail to accurately characterize the electricity consumption characteristics of various types of computing power in data center computing load forecasting, lack adaptability and robustness, and are difficult to meet the needs of low-carbon operation-related mechanisms.

Method used

We adopt a prediction method customized according to computing power type, extract features through convolutional neural networks, and combine elastic gating fusion mechanism to build basic, intelligent and supercomputing computing power-specific sub-models. We use easily accessible data for prediction and automatically enhance performance when high-order features are available.

🎯Benefits of technology

It achieves accurate and robust load forecasting for different types of computing power, improves forecasting accuracy and engineering applicability, and provides highly reliable load forecasting data to support low-carbon operation strategies.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122203192A_ABST
    Figure CN122203192A_ABST
Patent Text Reader

Abstract

The application relates to a multi-type computing power cluster power load prediction method and system for low-carbon operation, which comprises the following steps: obtaining historical load data, auxiliary data and historical job scheduling data of a target computing power cluster; performing feature extraction through a convolutional neural network to generate basic features and enhanced features; performing elastic gate fusion on the basic features and the enhanced features to obtain a final feature vector sequence; and inputting the feature vector sequence into corresponding prediction sub-models according to the computing power types of the target computing power cluster to output multi-scale load prediction results. Compared with the prior art, the application constructs special prediction sub-models for basic computing power, intelligent computing power and supercomputing power, aligns the model structure with the load characteristics of various types of computing power, significantly improves the prediction accuracy, and can adapt to different data completeness scenarios, thereby providing reliable load prediction data for low-carbon operation scenarios such as demand response and green power consumption of the computing power cluster.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of power system and data center collaborative optimization technology, specifically involving a method and system for predicting the power load of multi-type computing clusters for low-carbon operation. Background Technology

[0002] With technological advancements, data centers, as the core infrastructure of the digital economy, are experiencing continuous expansion. Currently, data centers exhibit significant spatiotemporal distribution characteristics, forming a cross-regional, multi-layered computing power network. Simultaneously, the forms of computing power within data centers are becoming increasingly diverse, including basic computing power supporting general computing tasks, intelligent computing power for AI training and inference, and supercomputing power for scientific computing. Different types of computing power exhibit significant differences in load patterns, energy efficiency characteristics, and runtime sequences, leading to complex and variable electricity consumption behaviors. For example, intelligent computing clusters, affected by model training task scheduling, exhibit highly sudden and volatile load characteristics; while supercomputing centers are characterized by periodic peaks and high power density. Accurately predicting the power load of different types of computing power is crucial for improving data center energy efficiency management, optimizing power distribution system design, and providing fundamental load data for their participation in low-carbon operation mechanisms such as grid demand response and green energy consumption.

[0003] In load forecasting technology, early research often employed time series models such as ARIMA or machine learning methods like SVM and random forests, which struggled to characterize the nonlinear and non-stationary characteristics of loads. In recent years, deep learning models have been widely adopted due to their powerful feature learning capabilities. Among them, Recurrent Neural Networks (RNNs) and their variant, Long Short-Term Memory (LSTM), excel at capturing temporal dependencies, while Convolutional Neural Networks (CNNs) can effectively extract local spatiotemporal features. To combine the advantages of both, the CNN-LSTM hybrid model has become the mainstream approach, achieving good results in load forecasting for some data centers. For example, patent application CN121012011A discloses a load forecasting method and system based on data center electrical systems. This method acquires electrical parameters, environmental parameters, and historical operating data of each subsystem through a data acquisition module, and normalizes and reconstructs the acquired data to form standardized time-series data. By utilizing a combined neural network model that includes convolutional neural networks, improved long short-term memory networks, and attention mechanisms, local fluctuations, mutations, frequencies, and statistical features are extracted to capture long-term data dependencies and key time-series information, thereby outputting electrical load forecasts.

[0004] However, existing deep learning prediction methods still have the following key shortcomings: (1) Failure to consider differences in computing power types. Existing prediction models adopt a unified architecture, lacking a systematic identification of the load patterns, fluctuation characteristics, and operating mechanisms of basic computing power, intelligent computing power, and supercomputing computing power, and are unable to specifically characterize the essential power consumption behavior of different types of computing power. Existing solutions generally treat data center computing power as a single load object and use a unified CNN-LSTM architecture for modeling, which cannot adapt to the differences in load dynamics of different types of computing power such as basic, intelligent, and supercomputing, resulting in large prediction deviations for sudden loads or batch processing peaks.

[0005] (2) High-order features are highly dependent and unreliable: Some studies have attempted to improve model accuracy by introducing fine-grained operational features such as task scheduling logs, GPU utilization, and job queue length. However, in actual operation and maintenance scenarios, such data is often difficult to obtain due to system isolation, security policy restrictions, missing interfaces, or incomplete historical storage. Once these high-order features become unavailable, model performance will drop significantly or even fail, severely restricting the engineering implementation of prediction methods.

[0006] (3) Existing technologies lack dedicated modeling mechanisms to match the typical load characteristics of intelligent computing power, such as the suddenness of intelligent computing power and the periodic peak of supercomputing power driven by job operations.

[0007] In summary, although deep learning technology has demonstrated its potential in data center computing load forecasting, a forecasting method that can accurately characterize the electricity consumption characteristics of various types of computing power while also possessing strong engineering applicability remains lacking. Especially when facing real-world scenarios with varying data completeness and complex, ever-changing load dynamics, existing solutions generally suffer from coarse modeling granularity, weak adaptability, and insufficient robustness. This makes it difficult to meet the demand for highly reliable load forecasting data when computing clusters participate in demand response, green energy consumption, and other low-carbon operation mechanisms under the new power system context.

[0008] Therefore, there is an urgent need for an adaptive prediction method that is tailored to the type of computing power, and that can work stably on basic observable data and automatically enhance performance when higher-order features are available. Summary of the Invention

[0009] The purpose of this invention is to overcome the shortcomings of the prior art by providing a multi-type computing power cluster power load forecasting method and system that is customized according to computing power type, data adaptive, and highly robust for low-carbon operation. The forecasting results can be used as data input for low-carbon operation related systems, providing a load forecasting data basis for scenarios such as demand response and green electricity consumption.

[0010] The objective of this invention can be achieved through the following technical solutions: A method for predicting power load across multiple types of computing power clusters for low-carbon operation includes the following steps: Obtain historical load data, auxiliary data, and historical job scheduling data of the target computing power cluster; Features are extracted from the historical load data, auxiliary data, and historical job scheduling data using a convolutional neural network to generate basic features and enhanced features. Specifically, basic features are constructed based on the historical load data and auxiliary data, and enhanced features are constructed based on the historical job scheduling data. The basic features and enhanced features are elastically gated and fused to obtain the final feature vector sequence; Based on the computing power type of the target computing power cluster, the feature vector sequence is input into the corresponding prediction sub-model, and multi-scale load prediction results are output. The multi-scale load prediction results are output through the data interface and used as input data for low-carbon operation strategy calculation.

[0011] Furthermore, the low-carbon operation scenarios include demand response and green electricity consumption.

[0012] Furthermore, the historical load data includes historical power load time series with a time granularity of 5 minutes or 15 minutes.

[0013] Furthermore, the auxiliary data includes at least several of the following: outdoor temperature, humidity, time-of-use electricity price, and power usage efficiency. The historical job scheduling data includes at least several of the following: job queue length and average GPU utilization.

[0014] Furthermore, the convolutional neural network includes convolutional layers, ReLU activation functions, max pooling layers, and Dropout layers.

[0015] Furthermore, the enhanced features include one or more of the following: task scheduling information, job running status, computing resource utilization, and network traffic indicators.

[0016] Furthermore, the fusion formula used in performing the elastic gating fusion is as follows: Among them, For gating weights, Basic features, To enhance features, The final feature vector sequence, superscript Indicates a time step.

[0017] Furthermore, the formula for calculating the gating weight is as follows: in, For the Sigmoid function, and These are the learnable weights and bias terms, respectively.

[0018] Furthermore, the prediction sub-model includes a basic computing power sub-model, an intelligent computing power sub-model, and a supercomputing power sub-model, wherein, The basic computing power sub-model adopts a 2-layer LSTM; The intelligent computing power sub-model introduces a Temporal Attention layer before the LSTM; The supercomputing power sub-model adopts a Seq2Seq architecture, with both the encoder and decoder being LSTM.

[0019] This invention also provides a multi-type computing cluster power load forecasting system for low-carbon operation, comprising: Data acquisition module: used to collect historical load data, auxiliary data, and historical job scheduling data of the target computing power cluster; Feature construction module: used to extract features from the historical load data, auxiliary data and historical job scheduling data through a convolutional neural network, and generate basic feature vectors and enhanced feature vectors, wherein basic features are constructed based on the historical load data and auxiliary data, and enhanced features are constructed based on the historical job scheduling data; Gated fusion module: used to perform elastic gating fusion on the basic feature vector and the enhanced feature vector to obtain the final feature vector sequence; Differentiated prediction module: used to store prediction sub-models corresponding to multiple different computing power types; Prediction output module: It is used to input the feature vector sequence into the corresponding prediction sub-model according to the computing power type of the target computing power cluster, output the multi-scale load prediction result, and output the multi-scale load prediction result to the energy management system through the data interface in a preset data format as input data for low-carbon operation strategy calculation.

[0020] Furthermore, the gated fusion module includes a fully connected layer and a Sigmoid activation function.

[0021] This invention provides a power load forecasting method and system that is customized according to computing power type, data adaptive, and highly robust, capable of achieving: (i) Construct dedicated prediction sub-models that are highly matched to the load characteristics of the three types of computing power: basic, intelligent, and supercomputing; (ii) Reliable forecasting can be achieved solely by relying on readily available power and environmental data; (iii) When high-level features such as task events and scheduling logs are available, the enhanced modeling path is automatically activated to further improve accuracy; (iv) Output classified and highly reliable computing load prediction results, and use them as basic data input for low-carbon operation-related systems, providing reliable data support for computing clusters to participate in low-carbon operation scenarios such as demand response and green electricity consumption.

[0022] Compared with the prior art, the present invention has the following beneficial effects: 1. Significantly Improved Prediction Accuracy. This invention constructs dedicated prediction sub-models for basic computing power, intelligent computing power, and supercomputing power, ensuring that the model structure is highly aligned with the load characteristics of each type of computing power. This significantly improves prediction accuracy in typical scenarios. This targeted design overcomes the problem of insufficient generalization ability of traditional unified models in different application scenarios, while also adapting to scenarios with varying data completeness, providing data centers with a more accurate and robust computing load prediction solution.

[0023] 2. High Engineering Robustness. This invention employs an adaptive architecture comprising a basic channel and optional enhancement channels. The basic channel utilizes readily available data, ensuring system stability and prediction accuracy even in the absence of advanced auxiliary information such as detailed scheduling logs or job queue lengths. When such advanced data becomes available, the system automatically integrates it into the enhancement channels, further improving prediction accuracy. This design not only enhances the model's applicability to different data environments but also strengthens its overall performance. This invention utilizes a flexible feature fusion mechanism of "basic channel + enhancement channel," achieving stable predictions based solely on power metering and environmental parameters. When higher-order features such as task events and scheduling logs become available, the system automatically activates enhancement paths to further improve efficiency, effectively avoiding performance drops caused by missing operational data.

[0024] 3. High prediction reliability. This invention starts directly from raw data and achieves an end-to-end intelligent process from data acquisition to prediction through efficient feature fusion technology and type-aware modeling strategies. This framework integrates effective processing of multi-source heterogeneous data, an adaptive feature selection mechanism, and prediction models optimized for different types of computing power, ensuring the accuracy and reliability of the prediction results. In particular, the gated fusion module proposed in this invention can dynamically adjust the input features according to the availability of data, thereby outputting the optimal prediction results under any circumstances.

[0025] 4. Flexible deployment and strong scalability. The three types of sub-models can be trained and deployed independently, support on-demand invocation, and adapt to computing clusters of different sizes and types, from small edge data centers to national-level computing power hubs, facilitating modular integration and iterative upgrades.

[0026] 5. Significant application value. The categorized computing load prediction results provided by this invention can be output through a data interface as basic data input for low-carbon operation-related systems. This provides reliable data support for computing clusters to participate in low-carbon operation mechanisms such as demand response and local consumption of green electricity, promoting the collaborative operation of "electricity-computing-carbon" and the integrated development of computing and electricity. Attached Figure Description

[0027] Figure 1 This is a flowchart illustrating the power load forecasting method in an embodiment of the present invention; Figure 2 This is a schematic diagram of the power load forecasting system in an embodiment of the present invention; Figure 3 This is a schematic diagram of the gating fusion module in an embodiment of the present invention. Detailed Implementation

[0028] The present invention will now be described in detail with reference to the accompanying drawings and specific embodiments. In this application, "computing power cluster" refers to a collection of computing resources (e.g., GPUs, CPU nodes, and resource pools managed by their scheduling systems) deployed within a data center, and its power load is the object of prediction in this invention; "data center" refers to the physical facilities and supporting systems that support the computing power cluster, and related environmental and energy efficiency indicators are optional auxiliary data sources. Unless otherwise stated, "target computing power cluster" in this document refers to the object to be predicted. This embodiment is implemented based on the technical solution of the present invention, providing detailed implementation methods and specific operating procedures, but the scope of protection of the present invention is not limited to the following embodiments.

[0029] Example 1 This embodiment provides a power load forecasting method for multi-type computing power clusters oriented towards low-carbon operation. Through the technical path of "customizing sub-models according to computing power type" and "elastic gating fusion mechanism", it achieves accurate and robust prediction of the power consumption behavior of basic computing power, intelligent computing power and supercomputing power. It is a computing power load forecasting method based on differentiated neural network architecture and elastic feature fusion.

[0030] like Figure 1 As shown, the method includes the following steps: S201. Obtain historical load data, auxiliary data, and historical job scheduling data of the target computing power cluster.

[0031] For a specific type of computing cluster, its historical power load time series is obtained through a power monitoring system. The time granularity is 5 minutes or 15 minutes. Easily accessible auxiliary data is collected simultaneously, including: outdoor temperature. ,humidity Time-of-use electricity pricing Power efficiency If the data center has interfaces with a job scheduling system and a monitoring system, it can also obtain the job queue length. GPU average utilization High-level operational data.

[0032] When training the prediction model, the collected data is divided into training set, validation set and test set in chronological order to ensure no time overlap.

[0033] S202. Construct a multi-channel input matrix.

[0034] The original load sequence As the main input, auxiliary variables are concatenated to form a multi-channel input matrix: (1) Basic channel: (2) Enhanced channel: If and If available, then there is Enhanced features include, but are not limited to, one or more of the following: task scheduling information, job running status, computational resource utilization, and network traffic metrics. Task scheduling information includes task submission flags, job queue length, and job priority; computational resource utilization includes GPU utilization, CPU load, and memory usage. Assuming a time window length of [missing information]... Then the basic channel input matrix can be expressed as The enhanced channel input matrix can be represented as: If the enhanced channel is unavailable, then .

[0035] S203. Extract features using CNN.

[0036] Feature extraction is performed on the historical load data, auxiliary data, and historical job scheduling data using a convolutional neural network to generate basic features and enhanced features. The basic features are constructed based on the historical load data and auxiliary data, while the enhanced features are constructed based on the historical job scheduling data.

[0037] In this embodiment, the multi-channel inputs are fed into a one-dimensional convolutional neural network (1D-CNN), which includes two convolutional layers (kernel sizes of 5 and 3 respectively), a ReLU activation function, a max-pooling layer, and a Dropout layer (dropout rate of 0.2), and outputs feature vectors of dimension 64. and .

[0038] S204. Perform flexible gating fusion of the above basic features and enhanced features.

[0039] The basic features and enhanced features are fused with elastic gating to achieve adaptive feature selection and obtain the final feature vector sequence.

[0040] Specifically: (1) For each time step Calculate the gating weights: ,in, This is the Sigmoid function. and They represent the first The basic feature vector and enhanced feature vector at each time step This represents a vector concatenation operation. and The learnable weights and biases are respectively used, and the above calculations are implemented using a single-layer fully connected neural network. (2) The final eigenvector is: , .

[0041] S205. Select a dedicated sub-model for time series prediction based on the computing power type.

[0042] Based on the computing power type of the target computing power cluster, the feature vector sequence is input into the corresponding prediction sub-model.

[0043] In this embodiment, the prediction sub-model includes a basic computing power sub-model, an intelligent computing power sub-model, and a supercomputing power sub-model: (1) Basic computing power sub-model (104a): A 2-layer LSTM (128 hidden units) is used. Dropout layers are used between layers (dropout rate 0.2). The output layer uses a fully connected neural network with a weight matrix dimension of [missing information]. ,generate Step load forecast value.

[0044] (2) Intelligent computing power sub-model (104b): A Temporal Attention layer is introduced before the LSTM to calculate the weights at each time step: We weight each time step in the input feature sequence to enhance sensitivity to burst loads.

[0045] in, For the first The fused feature vectors at each time step. For learnable context vectors, The projection matrix is ​​learnable. For the attention dimension. It is the hyperbolic tangent activation function. For the first The attention weights at each time step represent the importance of that time step to the final output. The weighted sequence... As input to the LSTM.

[0046] The intelligent computing sub-model uses a two-layer LSTM (128 hidden units) with dropout layers between layers (dropout rate of 0.3). The output layer uses a fully connected neural network with a weight matrix dimension of [dimension missing]. ,generate Step load forecast value.

[0047] Supercomputing power sub-model (104c): Employs a Seq2Seq architecture, with both the encoder and decoder being 2-layer LSTMs (128 hidden units). Dropout layers (0.25 dropout rate) are used between layers. The output layer uses a fully connected neural network with a weight matrix dimension of [missing information]. ,generate Step load forecast value.

[0048] Each sub-model is trained independently based on its corresponding historical load data, auxiliary data, and historical job scheduling data from its computing cluster to adapt to different types of load dynamics. The Adam optimizer and MAE loss function are used during the training phase. Specifically: (1) Basic computing power sub-model (104a): The training data comes from a stable general-purpose server cluster, and the load exhibits strong periodicity and stability. A small learning rate and an early stopping mechanism are used during training to prevent overfitting; (2) Intelligent computing power sub-model (104b): The training data includes sudden load peaks caused by AI training tasks. To enhance the ability to capture sudden events, a time-sensitive regularization term is introduced into the loss function, that is, higher weights are applied to time steps with larger prediction errors; (3) Supercomputing power sub-model (104c): The training data comes from a high-performance computing cluster driven by job scheduling, which has obvious job periodicity and task dependency. A teacher forcing strategy is used in the training phase to improve the modeling accuracy.

[0049] Each sub-model was evaluated on the validation and test sets using mean absolute error (MAE), root mean square error (RMSE), and mean absolute percentage error (MAPE). Furthermore, the tests were repeated under conditions of missing enhanced features to verify the resilience of the gated fusion module.

[0050] S206, Output multi-scale load prediction results.

[0051] It can support multi-scale load forecasting for the next 24 hours, 12 hours, and 1 hour. In this embodiment, the multi-scale load forecasting results include the predicted load time series and corresponding time window information, and are output to the energy management system through a data interface. This provides a load forecasting data foundation for the computing power cluster to participate in low-carbon operation scenarios such as demand response and green electricity consumption, and is applied to the calculation of low-carbon operation strategies.

[0052] If the above methods are implemented as software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this invention, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0053] Example 2 This embodiment provides a multi-type computing cluster power load forecasting system for low-carbon operation, such as... Figure 2 As shown, it includes: Data acquisition module 101: used to collect historical load data, auxiliary data and historical job scheduling data of the target computing power cluster; Feature construction module 102: used to extract features from the historical load data, auxiliary data and historical job scheduling data through a convolutional neural network, and generate basic feature vectors and enhanced feature vectors, wherein basic features are constructed based on the historical load data and auxiliary data, and enhanced features are constructed based on the historical job scheduling data; Gated fusion module 103: used to perform elastic gating fusion on the basic feature vector and the enhanced feature vector to obtain the final feature vector sequence; Differentiated prediction module 104: used to store prediction sub-models corresponding to multiple different computing power types; Prediction output module 105: It is used to input the feature vector sequence into the corresponding prediction sub-model according to the computing power type of the target computing power cluster, output the multi-scale load prediction result, and output the prediction result to the energy management system through the data interface in a preset data format as input data for low-carbon operation strategy calculation.

[0054] In a specific implementation, the prediction sub-models stored in the differentiated prediction module 104 include a basic computing power sub-model 104a, an intelligent computing power sub-model 104b, and a supercomputing power sub-model 104c, wherein... The basic computing power sub-model uses a 2-layer LSTM; The intelligent computing sub-model introduces a Temporal Attention layer before the LSTM; The supercomputing power sub-model adopts a Seq2Seq architecture, with both the encoder and decoder being LSTM.

[0055] like Figure 3 As shown, the gated fusion module includes a fully connected layer and a sigmoid activation function. The aligned base feature vector and the enhanced feature vector are concatenated to obtain a concatenated vector. The fully connected layer has an input dimension of 128 and an output dimension of 1. Its weight matrix contains 128 learnable parameters, and its bias term contains 1 learnable parameter. After sigmoid activation, the output gate weights g∈(0,1) are used to perform weighted fusion of the base and enhanced features. When the enhanced features are unavailable, their corresponding input channels are filled with zero values. The gated fusion module, through end-to-end training, automatically learns to approach zero gate weights in this situation, ensuring that the fused features are mainly dominated by the base features, thus guaranteeing the robustness of the model under different data completeness scenarios.

[0056] The rest is the same as in Example 1.

[0057] This invention is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart illustrations and / or block diagrams. Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.

[0058] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.

[0059] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.

[0060] The preferred embodiments of the present invention have been described in detail above. It should be understood that those skilled in the art can make numerous modifications and variations based on the concept of the present invention without creative effort. Therefore, all technical solutions that can be obtained by those skilled in the art based on the concept of the present invention through logical analysis, reasoning, or limited experimentation on the basis of existing technology should be within the scope of protection defined by the claims.

Claims

1. A method for predicting power load across multiple types of computing power clusters for low-carbon operation, characterized in that, Includes the following steps: Obtain historical load data, auxiliary data, and historical job scheduling data of the target computing power cluster; Features are extracted from the historical load data, auxiliary data, and historical job scheduling data using a convolutional neural network to generate basic features and enhanced features. Specifically, basic features are constructed based on the historical load data and auxiliary data, and enhanced features are constructed based on the historical job scheduling data. The basic features and enhanced features are elastically gated and fused to obtain the final feature vector sequence; Based on the computing power type of the target computing power cluster, the feature vector sequence is input into the corresponding prediction sub-model, and multi-scale load prediction results are output. The multi-scale load prediction results are output through the data interface and used as input data for low-carbon operation strategy calculation.

2. The multi-type computing power cluster power load forecasting method for low-carbon operation according to claim 1, characterized in that, The historical load data includes historical power load time series with a time granularity of 5 minutes or 15 minutes.

3. The multi-type computing power cluster power load forecasting method for low-carbon operation according to claim 1, characterized in that, The auxiliary data includes at least several of the following: outdoor temperature, humidity, time-of-use electricity price, and power usage efficiency. The historical job scheduling data includes at least several of the following: job queue length and average GPU utilization.

4. The multi-type computing power cluster power load forecasting method for low-carbon operation according to claim 1, characterized in that, The convolutional neural network includes convolutional layers, ReLU activation function, max pooling layer and Dropout layer.

5. The multi-type computing power cluster power load forecasting method for low-carbon operation according to claim 3, characterized in that, The enhanced features include one or more of the following: task scheduling information, job running status, computing resource utilization, and network traffic indicators.

6. The multi-type computing power cluster power load forecasting method for low-carbon operation according to claim 1, characterized in that, The fusion formula used in the elastic gating fusion process is as follows: in, For gating weights, Basic features, To enhance features, The final feature vector sequence, superscript Indicates a time step.

7. The multi-type computing power cluster power load forecasting method for low-carbon operation according to claim 6, characterized in that, The formula for calculating the gating weight is: in, For the Sigmoid function, and These are the learnable weights and bias terms, respectively.

8. The multi-type computing power cluster power load forecasting method for low-carbon operation according to claim 1, characterized in that, The prediction sub-model includes a basic computing power sub-model, an intelligent computing power sub-model, and a supercomputing computing power sub-model, wherein... The basic computing power sub-model adopts a 2-layer LSTM; The intelligent computing power sub-model introduces a Temporal Attention layer before the LSTM; The supercomputing power sub-model adopts a Seq2Seq architecture, with both the encoder and decoder being LSTM.

9. A multi-type computing power cluster power load forecasting system for low-carbon operation, characterized in that, include: Data acquisition module (101): used to collect historical load data, auxiliary data and historical job scheduling data of the target computing power cluster; Feature construction module (102): used to extract features from the historical load data, auxiliary data and historical job scheduling data through a convolutional neural network, and generate basic feature vectors and enhanced feature vectors, wherein basic features are constructed based on the historical load data and auxiliary data, and enhanced features are constructed based on the historical job scheduling data; Gated fusion module (103): used to perform elastic gating fusion on the basic feature vector and the enhanced feature vector to obtain the final feature vector sequence; Differentiated prediction module (104): used to store prediction sub-models corresponding to multiple different computing power types; Prediction output module (105): It is used to input the feature vector sequence into the corresponding prediction sub-model according to the computing power type of the target computing power cluster, output the multi-scale load prediction result, and output the multi-scale load prediction result to the energy management system through the data interface in a preset data format as input data for low-carbon operation strategy calculation.

10. The multi-type computing power cluster power load forecasting system for low-carbon operation according to claim 9, characterized in that, The gated fusion module (103) includes a fully connected layer and a Sigmoid activation function.