Method and system for evaluating algorithm efficiency, computer device and storage medium
By calculating computing power and efficiency based on processor utilization in data centers and combining this with a computing efficiency prediction model, the problem of inaccurate computing power in existing technologies has been solved, enabling scientific evaluation and optimization of data center computing efficiency.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- CHINA MOBILE GROUP DESIGN INST
- Filing Date
- 2024-12-13
- Publication Date
- 2026-06-16
AI Technical Summary
In existing technologies, the computing power of data centers mainly relies on the nominal value of processor chips without considering the actual operating status of the processors. This results in insufficient and incomplete computing power, and the computing efficiency evaluation results are difficult to reflect the true level of the data center.
By calculating the computing power based on the processor utilization of the data center computing system and combining it with the operating power, the computing efficiency is determined. The computing efficiency prediction model is used to predict future resource usage trends, generate optimization strategies, and optimize the computing efficiency of the data center.
It enables scientific evaluation of data center computing efficiency, ensures the accuracy and comprehensiveness of computing power, provides accurate computing efficiency evaluation results and optimization strategies, and improves the overall performance of data centers.
Smart Images

Figure CN122220187A_ABST
Abstract
Description
Technical Field
[0001] This disclosure relates to the field of computing power network technology, and in particular to a computing efficiency evaluation method, system, computer equipment, and storage medium. Background Technology
[0002] Conducting a scientific and reasonable evaluation of the computing power and efficiency of data centers is a key focus for achieving "dual carbon" goals and ensuring efficient construction and operation for enterprises. Computing efficiency, the ratio of computing power to power consumption in a data center, is an indicator that considers both computing performance and energy consumption. Data center efficiency can be improved by increasing computing power or reducing energy consumption.
[0003] The computing power of a data center is determined by a variety of factors, including computing speed, cache size, read and write speed, network connection bandwidth, information transmission latency design, and the unified control capability of the system. However, currently, computing power is mainly measured based on the nominal value of processor chips (such as CPU or GPU), without taking into account the actual operating status of the processor. This results in an insufficiently systematic and comprehensive computing power measurement, and the computing efficiency evaluation results are difficult to reflect the true computing efficiency level of the data center. Summary of the Invention
[0004] In view of the above problems, this disclosure is made to provide a computational efficiency evaluation method, system, computer device and storage medium.
[0005] According to one aspect of this disclosure, a computational efficiency evaluation method is provided, comprising:
[0006] The computing power of the computing system is calculated based on the processor utilization of the servers in the data center computing system. The computing power is the computing power output by the computing system in the running state.
[0007] Based on the computing power and power consumption of the computing system, the computing efficiency of the computing system is calculated.
[0008] Based on the operational computing efficiency of the computing system, determine the computing efficiency evaluation result of the data center; or, obtain the design computing efficiency of the computing system, and based on the design computing efficiency and operational computing efficiency of the computing system, determine the computing efficiency evaluation result of the data center.
[0009] Furthermore, according to one aspect of the computing performance evaluation method of this disclosure, after determining the computing performance evaluation result of the data center, the method further includes:
[0010] The computing performance evaluation results and the business status data of the data center are input into a pre-trained computing performance prediction model to obtain the future resource usage trend of the data center output by the computing performance prediction model; wherein, the computing performance prediction model is constructed using a time series model;
[0011] Based on the aforementioned future resource usage trends, the future computing efficiency of the computing systems included in the data center is confirmed.
[0012] Based on the future computing performance and operational computing performance of the computing system, a matching optimization strategy is obtained from the strategy library;
[0013] In response to the absence of an optimization strategy in the strategy library that matches the future computing performance and operational performance of the computing system, an optimization strategy is generated using a pre-trained deep learning model, and the generated optimization strategy is used as the matching optimization strategy and stored in the strategy library.
[0014] Furthermore, according to one aspect of the computational efficiency evaluation method of this disclosure, after obtaining the matching optimization strategy, it also includes:
[0015] Control the data center to execute the matched optimization strategy, and obtain the optimized computing performance of the data center after executing the matched optimization strategy;
[0016] If the optimized computing performance does not meet the preset computing performance target, the optimized computing performance is taken as the latest computing performance, and the steps of determining the computing performance evaluation result of the data center and obtaining the matching optimization strategy are repeated until the optimized computing performance meets the preset computing performance target.
[0017] Furthermore, according to one aspect of the computational efficiency evaluation method of this disclosure, the computing power of a computing system is calculated based on the processor utilization of servers included in the computing system of a data center, including:
[0018] The obtained processor utilization of the server is input into the pre-trained computing power model corresponding to the computing system. Based on the computing power precision preset by the computing system, the running computing power of the server is obtained. The computing power model uses a neural network algorithm and is constructed based on the mathematical function relationship between running computing power and processor utilization.
[0019] The computing power of the computing system is obtained by summing the computing power of all the servers included in the computing system.
[0020] Furthermore, according to one aspect of the computing performance evaluation method of this disclosure, the computing performance evaluation result of the data center is determined based on the operational computing performance of the computing system, including:
[0021] When the computing system includes at least one sub-computing system, the computing efficiency of the multiple sub-computing systems is represented in the form of an array to obtain the total computing efficiency of the data center;
[0022] Based on the total computing efficiency of the data center, determine the computing efficiency evaluation result of the data center;
[0023] Based on the design and operational computational efficiency of the computing system, the computational efficiency evaluation results of the data center are determined, including:
[0024] The effective computing efficiency of the computing system is obtained based on the ratio of its operational computing efficiency to its designed computing efficiency.
[0025] Based on the effective computing efficiency of the computing system, the computing efficiency evaluation result of the data center is determined.
[0026] Furthermore, according to one aspect of the computational efficiency evaluation method of this disclosure, the design computational efficiency of the computing system is obtained, including:
[0027] Based on the nominal computing power of the server chips included in the computing system, the design computing power of the server is calculated, wherein the design computing power is the computing power predicted to be output by the server during the design phase.
[0028] The design computing power of each computing system is obtained based on the sum of the design computing power of the servers included in the computing system.
[0029] Based on the design computing power and design power of the computing system, the design computing efficiency of the computing system is calculated.
[0030] Furthermore, according to one aspect of the computing performance evaluation method of this disclosure, the computing system includes at least one sub-computing system, which is divided according to the computing density of the server cluster in the data center; calculating the computing power of the computing system includes calculating the computing power of each of the sub-computing systems respectively.
[0031] The calculation of the computing system's operational efficiency includes calculating the operational efficiency of each of the sub-computing systems; based on the operational efficiency of the computing system, determining the computing efficiency evaluation result of the data center includes:
[0032] Based on the operational computing efficiency of each of the sub-computing systems, the computing efficiency evaluation result of the data center is determined;
[0033] Based on the design and operational computational efficiency of the computing system, the computational efficiency evaluation results of the data center are determined, including:
[0034] Based on the design and operational computational efficiency of each of the sub-computing systems, the computational efficiency evaluation results of each of the sub-computing systems are determined. Based on the computational efficiency evaluation results of each of the sub-computing systems, the computational efficiency evaluation results of the data center are determined.
[0035] According to another aspect of this disclosure, a computing performance evaluation system is provided, including a server cluster and a cloud management platform;
[0036] The cloud management platform evaluates the computing efficiency of the server cluster using the method described above.
[0037] According to another aspect of this disclosure, a computational efficiency evaluation apparatus is provided, comprising:
[0038] The first computing module is used to calculate the computing power of the computing system based on the processor utilization of the servers included in the computing system of the data center. The computing power is the computing power output by the computing system in the running state.
[0039] The second calculation module is used to calculate the computing efficiency of the computing system based on the computing power and power consumption of the computing system.
[0040] The evaluation module is used to determine the computing efficiency evaluation result of the data center based on the operational computing efficiency of the computing system; or, to obtain the design computing efficiency of the computing system and determine the computing efficiency evaluation result of the data center based on the design computing efficiency and operational computing efficiency of the computing system.
[0041] According to another aspect of this disclosure, a computer device is provided, including a memory, a processor, and a computer program stored in the memory, the processor executing the computer program to implement the method of one aspect above.
[0042] According to another aspect of this disclosure, a computer-readable storage medium is provided having a computer program stored thereon that, when executed by a processor, implements the method of one aspect above.
[0043] According to another aspect of this disclosure, a computer program product is provided, including a computer program that, when executed by a processor, implements the method of the above-described aspect.
[0044] As will be described in detail below, a computing performance evaluation method, system, computer device, and storage medium according to embodiments of the present disclosure calculate the operating computing power of each computing system by means of processor utilization, taking into account the actual operating state of each computing system, which can ensure the accuracy of the operating computing power and thus ensure the accuracy of computing performance; by comparing and evaluating the designed computing performance with the operating computing performance, the evaluation results can more accurately reflect the real computing performance level of the data center, which is conducive to achieving a scientific evaluation of computing performance.
[0045] It should be understood that both the foregoing general description and the following detailed description are exemplary and intended to provide further illustration of the claimed technology. Attached Figure Description
[0046] The above and other objects, features, and advantages of this disclosure will become more apparent from the more detailed description of the embodiments thereof in conjunction with the accompanying drawings. The drawings are provided to further illustrate the embodiments of this disclosure and form part of the specification. They are used together with the embodiments of this disclosure to explain the disclosure and do not constitute a limitation thereof. In the drawings, the same reference numerals generally represent the same components or steps.
[0047] Figure 1 This is a flowchart illustrating a computational efficiency evaluation method according to some embodiments of the present disclosure.
[0048] Figure 2 This is a flowchart illustrating yet another computational efficiency evaluation method according to some embodiments of the present disclosure.
[0049] Figure 3 This is an illustration of the architecture of a computational efficiency evaluation system based on some embodiments of the present disclosure.
[0050] Figure 4 This is an illustrated flowchart of an application of a computational efficiency evaluation system according to some embodiments of the present disclosure.
[0051] Figure 5 This is a schematic diagram of the structure of a computational efficiency evaluation device according to some embodiments of the present disclosure.
[0052] Figure 6 This is a schematic diagram illustrating the structure of a computer device according to some embodiments of the present disclosure.
[0053] Figure 7 This is a schematic diagram illustrating a computer program product according to some embodiments of the present disclosure. Detailed Implementation
[0054] To make the objectives, technical solutions, and advantages of this disclosure more apparent, exemplary embodiments according to this disclosure will now be described in detail with reference to the accompanying drawings. Obviously, the described embodiments are merely some embodiments of this disclosure, and not all embodiments of this disclosure. It should be understood that this disclosure is not limited to the exemplary embodiments described herein.
[0055] Conducting a scientific and reasonable evaluation of the computing power and efficiency of data centers is a key focus for achieving "dual carbon" goals and ensuring efficient construction and operation for enterprises. Computing efficiency, the ratio of computing power to power consumption in a data center, is an indicator that considers both computing performance and energy consumption. Data center efficiency can be improved by increasing computing power or reducing energy consumption.
[0056] The computing power of a data center is determined by a variety of factors, including computing speed, cache size, read and write speed, network connection bandwidth, information transmission latency design, and the unified control capability of the system. However, currently, computing power is mainly measured based on the nominal value of the device chip, which results in an unsystematic and incomplete measurement of computing power and also affects the comprehensiveness and accuracy of computing efficiency.
[0057] Depending on the computing power output per unit volume (i.e., computing density), data centers include various types of computer systems. These include systems that use a central processing unit (CPU) as the primary computing unit, combined with other hardware and software resources to perform general-purpose computing tasks (i.e., general computing systems); systems that use graphics processing units (GPUs) as the primary computing unit (i.e., intelligent computing systems); and business systems that integrate training and push applications, composed of both general computing and intelligent computing systems. Because different types of computer systems have vastly different application scenarios and computing objectives, their total computing power and power consumption also vary significantly, making it difficult to uniformly evaluate the computing efficiency of different types of computer systems.
[0058] The above description, with reference to the accompanying drawings, illustrates a computing performance evaluation method, system, computer device, and storage medium according to embodiments of the present disclosure. By dividing the server cluster in a data center into multiple computing systems based on server computing density, and performing computing power calculations on each computing system according to different application scenarios, the comprehensiveness and systematic nature of the computing power calculations are ensured. The operating computing power of each computing system is calculated using processor utilization, taking into account the actual operating state of each system, thus ensuring the accuracy of the operating computing power calculations and consequently the accuracy of the computing performance. By comparing and evaluating the designed computing performance with the operating computing performance, the obtained evaluation results can more accurately reflect the true computing performance level of the data center, facilitating a scientific evaluation of computing performance.
[0059] This disclosure proposes three different computing density evaluation dimensions for data center computing efficiency, namely, general computing system, intelligent computing system and integrated general and intelligent computing system, and two evaluation perspectives, namely, design status and operation status. It clarifies the specific computing power accuracy and defines the design computing power and operation computing power for the first time, realizing the evaluation of data center computing efficiency from a multi-dimensional perspective.
[0060] By constructing a computing power model, the functional relationship between computing power and processor utilization was confirmed, which is conducive to the effective measurement of computing power. By evaluating the results, the future computing performance and operating performance of each computing system were obtained, and corresponding optimization strategies were acquired. In this way, computing performance optimization of the data center was carried out, which is conducive to improving the overall performance of the data center.
[0061] This disclosure considers the evaluation of data center computing power and efficiency based on different computing densities in various application scenarios. It uses classification evaluation and intelligent algorithm collaboration methods, as well as cutting-edge technologies such as artificial intelligence, to construct a data center computing power and efficiency evaluation system architecture in a layered decoupling and modular manner. The system can diagnose and evaluate data centers and provide optimization strategies, thereby promoting the overall improvement of data center computing efficiency.
[0062] To facilitate understanding of this disclosure, a detailed description of a computational efficiency evaluation method disclosed in the embodiments of this disclosure will be provided first. The execution subject of the computational efficiency evaluation method provided in the embodiments of this disclosure is generally a computer device with a certain computing capability. Specifically, the computer device can be the computing power equipment of a data center, such as a server. In some possible implementations, the computational efficiency evaluation method can be implemented by a processor calling computer-readable instructions stored in memory.
[0063] like Figure 1 The diagram shows a flowchart of a computational efficiency evaluation method provided in some embodiments of this disclosure, the method including S101-S103:
[0064] S101: Based on the processor utilization of the servers in the data center computing system, the computing power of the computing system is calculated.
[0065] In some cases, a computing system comprises an IT system consisting of one or more IT devices (such as servers). The computing power of this system refers to the computing power output during operation. For general-purpose servers, processor utilization is the CPU utilization rate; for intelligent computing servers, it is the GPU utilization rate. Processor utilization measures the percentage of time a processor (CPU / GPU) is occupied executing tasks within a time slice. It reflects the processor's workload and thus indirectly reflects the computing power usage during operation. Processor utilization can be directly collected from the cloud computing resource pool network management system of the data center.
[0066] Specifically, the obtained processor utilization of the server is input into the pre-trained computing power model corresponding to the computing system, and the computing power of the server is obtained according to the preset computing power accuracy of the computing system.
[0067] The server's computing power refers to the computing power output by the server while it is running. The computing power model utilizes a neural network algorithm and is constructed based on the mathematical function relationship between computing power and processor utilization, forming a general computing power model. The specific formula is as follows:
[0068]
[0069] Among them, Cr1 The actual computing power output by the general computing server, i.e., the running computing power, where n is the number of CPUs and C is the total computing power. i,d U represents the nominal computing power of the CPU. i,cpu For CPU utilization, W i For CPU matching parameters, α i This represents the CPU's bias coefficient. Where W... i With α i This can be obtained by training a neural network model.
[0070]
[0071] Among them, C r2 The actual computing power output by the intelligent computing server, i.e., the running computing power, where m is the number of GPUs and C is the number of GPUs. j,d U represents the nominal computing power of a GPU. j,gpu For GPU utilization, W j For matching parameters of the GPU, β j This represents the GPU's bias coefficient. Where W... j With β j This can be obtained by training a neural network model.
[0072] It should be noted that CPU utilization, GPU utilization, and computing performance are not linearly related, but in actual calculations, piecewise approximation or linear regression methods can be used for simulation to improve operability.
[0073] Computing power is influenced by various factors, including the actual operating status of the hardware, software scheduling strategies, task load, and changes in the external environment. Computing power is dynamic and is a crucial indicator of computing power in its running state. It needs to be obtained through real-time performance monitoring data of the computing system.
[0074] Optionally, the data center's computing system can be divided into multiple sub-computing systems based on the computing density of the server cluster.
[0075] Specifically, computing density refers to the computing power provided by a server per unit volume. Within the same space, for example, servers of the same volume, an intelligent computing server will certainly provide more computing power than a general-purpose computing server. For example, based on server computing density, a data center's computing system can be divided into at least one of the following three sub-computing systems:
[0076] (1) General Computing System: Composed of general computing servers, with a central processing unit (CPU) as the main computing unit, it is a computer system used to perform general computing tasks. Optionally, the computing power precision is single-precision floating-point calculation (Flops / FP32).
[0077] (2) Intelligent Computing System: Composed of intelligent computing servers and a small number of general computing servers, with graphics processing units (GPUs, TPUs, NPUs, etc.) and artificial intelligence chips as the main computing units. It is mainly used to handle large-scale parallel computing tasks, especially in the fields of artificial intelligence and deep learning. Optionally, the computing power precision is based on half-precision floating-point computing (Flops / FP16 / BF16), suitable for artificial intelligence training clusters that include cross-node parameter plane network topology.
[0078] (3) Integrated Computing System: Composed of general computing servers and intelligent computing servers, this system is a business system consisting of training-inference-intelligent computing clusters or inference-inference-intelligent computing clusters. The main computing units are CPUs for business processing and GPUs, TPUs, and NPUs (hereinafter collectively referred to as GPUs) for AI training and inference computation. Optionally, the computing power precision is based on mixed-precision floating-point computing (Flops / (FP16 / FP32) / INT8, etc.), suitable for business systems that do not include cross-node parameter plane network topology but include AI inference applications and business systems that include integrated training-inference applications.
[0079] The above computing subsystems are merely examples and can be divided according to the computing subsystems owned by the data center.
[0080] When a computing system includes at least one sub-computing system, the computing power of each sub-computing system is obtained by summing the computing power of the servers included in each sub-computing system. For example:
[0081] The computing power of the general computing system (FP32) = ∑ computing power of the general computing server (FP32);
[0082] The computing power of the intelligent computing system (FP16) = ∑ computing power of the intelligent computing server (FP16);
[0083] The computing power of the integrated computing and intelligent computing system (FP32 / FP16) = {∑ computing power of the integrated computing server (FP32), ∑ computing power of the intelligent computing server (FP16)}.
[0084] S102: Calculate the computing efficiency of each computing system based on its computing power and power consumption.
[0085] Operational computational efficiency is a key indicator for measuring the energy efficiency of a data center during operation, directly reflecting the balance between performance and energy consumption when handling business loads. The operational computational efficiency of a computing system is the sum of the operational computational efficiencies of all included IT equipment.
[0086] The definition of computing power efficiency is: the ratio of the computing power of IT equipment (such as servers) in a data center to the power consumption of the IT equipment (unit: FLOPS / W). The calculation formula is as follows:
[0087] IT equipment operating efficiency = IT equipment operating computing power / IT equipment operating power;
[0088] The computing power of IT equipment is dynamic and needs to be modeled using processor (such as CPU) utilization to achieve real-time monitoring, measurement, and characterization of computing power. The methods for obtaining the computing power of an IT system are as follows:
[0089] CC = F(x)
[0090] Where CC is the computing capacity, x is the processor utilization, and F is the model function of computing capacity versus processor utilization, which can realize the conversion from processor utilization to computing capacity.
[0091] The operating power of an IT system is the actual electrical energy consumed by all IT devices in a data center per unit of time, reflecting the true energy consumption level of the computing system when processing business loads.
[0092] Operational computing efficiency characterizes the actual computing power output per unit of energy consumed by a data center computing system during operation. Improving operational computing efficiency mainly involves adopting a series of optimization strategies to reduce energy consumption for data IT equipment. The main influencing indicators are: operating mode, CPU frequency, and fan speed.
[0093] Operating modes are the operational strategies adopted by IT equipment in different working states, aiming to balance performance and energy consumption.
[0094] CPU frequency is the number of clock cycles that the CPU of an IT device can execute per second, and it is directly related to the performance of the processor.
[0095] Fan speed determines the cooling efficiency of an IT device's heat dissipation system, which is used to maintain the normal operating temperature of the hardware.
[0096] When the computing system includes at least one sub-computing system, the computational efficiency of each sub-computing system is calculated separately, for example:
[0097] The computational efficiency (CE) of the general computing system cpu / 32 = Total computing power of the general computing system / Total power of the general computing system;
[0098] The computational efficiency (CE) of intelligent computing systems gpu / 16 = Total computing power of the intelligent computing system / Total power of the intelligent computing system;
[0099] The computational efficiency (CE) of the intelligent integrated system cpu / gpu = {Total computing power of general computing system / Total power of general computing system, Total computing power of intelligent computing system / Total power of intelligent computing system}.
[0100] S103: Determine the data center's computing performance evaluation results based on the operational computing performance of the computing system; or, obtain the design computing performance of the computing system, and determine the data center's computing performance evaluation results based on the design computing performance and operational computing performance of the computing system. Specifically, the following optional computing performance evaluation methods are provided, and users can choose according to their usage needs or application scenarios:
[0101] Method 1: Based on the computing system's operational efficiency, determine the data center's computing efficiency evaluation results, specifically as follows:
[0102] When a computing system includes at least one sub-computing system, the computing power of multiple sub-computing systems is represented as an array to obtain the total computing power of the data center; for example:
[0103] Total computing power (CE) of a data center = {Computing power of the general computing system (CE)} cpu / 32 The computational efficiency (CE) of intelligent computing systems gpu / 16 The computational efficiency (CE) of the intelligent integrated system cpu / gpu )}.
[0104] The evaluation results of data center computing efficiency are determined based on the total computing efficiency of the data center.
[0105] Method 2: Obtain the design computational efficiency of the computing system. Based on the design and operational computational efficiency of the computing system, determine the computational efficiency evaluation results of the data center, specifically as follows:
[0106] The effective computing efficiency of a computing system is obtained by comparing its operational computing efficiency with its design computing efficiency.
[0107] Based on the effective computing efficiency of the computing system, the evaluation results of the computing efficiency of the data center are determined.
[0108] Optionally, when the computing system includes at least one sub-computing system, the design computing efficiency of each sub-computing system is obtained separately, and the effective computing efficiency of each sub-computing system is calculated based on the ratio of the operating computing efficiency to the design computing efficiency of each sub-computing system; based on the effective computing efficiency of each sub-computing system, the computing efficiency evaluation result of the data center is determined.
[0109] The closer the operational computational efficiency is to the design computational efficiency, the higher the computational efficiency utilization and the higher the effective computational efficiency. Taking a general computing system as an example:
[0110] The computational efficiency of the general computing system = the computational power of the general computing system / the power of the general computing system;
[0111] The computing efficiency of the general computing system = the computing power of the general computing system / the power consumption of the general computing system;
[0112] Effective computational efficiency = Computational efficiency of general computing system operation / Computational efficiency of general computing system design * 100%.
[0113] It should be noted that Method 1 is suitable for comparing computing performance with other data centers, while Method 2 is suitable for evaluating the computing performance of individual sub-computing systems within a data center.
[0114] Among them, design computational efficiency represents the computational efficiency level of a data center in its design state. Design computational efficiency evaluates the advancement level of the system design scheme and the adoption of green and energy-saving technologies, and is applicable to evaluating the greenness of equipment, system, and supporting system designs.
[0115] Design efficiency is defined as the ratio of the design computing power of IT equipment in a computing system to the design power of all IT equipment (unit: FLOPS / W). The formula for calculating design efficiency is:
[0116] Design computational efficiency = IT equipment design computational power / IT equipment design power
[0117] Design computing efficiency characterizes the computing power output per unit of energy consumed by a data center's computing system in its design state. Improving design computing efficiency primarily involves enhancing the utilization rate of the computing system's infrastructure. The main influencing indicators are: space utilization rate, rack operating power load rate, and rack design power utilization rate.
[0118] Space utilization rate refers to the ratio of the number of storage units (Us) occupied by IT equipment installed in the data center to the total number of storage units that the data center racks can provide. It is used to measure the level of rack space utilization.
[0119] Rack operating power load ratio refers to the ratio of the peak power consumption of IT equipment installed in the rack to the rack's design power consumption, and is used to measure the rack's peak operating load level.
[0120] Rack design power utilization rate refers to the ratio of the design power consumption of IT equipment installed in the rack to the rack design power consumption, which is used to measure the design power utilization level of the rack.
[0121] The design efficiency of the computing system can be obtained directly from the design efficiency database. The specific calculation process of the design efficiency includes the following steps 1-3:
[0122] Step 1: Calculate the design computing power of each server based on the nominal computing power of the chips in the computing system.
[0123] Design computing power refers to the computing power predicted for the server's output during the design phase, while nominal computing power refers to the nominal computing power of the CPU or GPU chip, which is the maximum computing power specified for the processor chip. Design computing power can be calculated by weighting the nominal computing power of the processor chip at the time of manufacture; for example, design computing power = nominal computing power * 60%.
[0124] Design computing power refers to the total computing power that the entire computing system possesses and can achieve, formed by determining the design scheme based on functional and performance requirements such as computing, storage, networking, functionality, performance, reliability, and security during the design phase of a computing system. It primarily estimates the maximum computing power that the computing system may reach in the future, based on factors such as hardware specifications, software architecture, and algorithm optimization. Design computing power is an important reference indicator at the initial stage of computing system design, and it has guiding significance for subsequent resource allocation and task scheduling.
[0125] Step 2: Obtain the design computing power of the computing system based on the sum of the design computing power of the servers included in the computing system.
[0126] When a computing system includes at least one sub-computing system, for the same computing system, the designed computing power and the operating computing power have the same computing power accuracy, specifically:
[0127] The designed computing power of the general computing system (FP32) = ∑ the designed computing power of the general computing server (FP32);
[0128] The computing power of the intelligent computing system (FP16) = ∑ the computing power of the intelligent computing server (FP16);
[0129] The computing power of the integrated computing and intelligent computing system (FP32 / FP16) = {∑ computing power of the integrated computing server (FP32), ∑ computing power of the intelligent computing server (FP16)}.
[0130] Step 3: Calculate the design computing efficiency of the computing system based on the design computing power and design power of the computing system.
[0131] In cases where the computing system includes at least one sub-computing system, specifically for example:
[0132] The computational efficiency of the general computing system = the computational power of the general computing system / the power of the general computing system;
[0133] Intelligent computing system design efficiency = Intelligent computing system design computing power / Intelligent computing system design power;
[0134] The computational efficiency of the integrated computing system = {the computational efficiency of the integrated computing system design + the computational efficiency of the intelligent computing system design}.
[0135] Optionally, this disclosure may include the following steps after S103:
[0136] S104: Optimize computing performance in the data center. This includes the following steps 1-4:
[0137] Step 1: Input the computing performance evaluation results and the data center's business status data into the pre-trained computing performance prediction model to obtain the future resource usage trend of the data center output by the computing performance prediction model.
[0138] The computational efficiency prediction model is constructed using a time series model.
[0139] Step 2: Based on future resource usage trends, confirm the future computing efficiency of the computing systems included in the data center.
[0140] The future computing performance status refers to the computing performance over a future period of time. Specifically, a control period can be set, and the duration of the control period can be customized according to actual needs, such as 1 hour. The future computing performance status can be the computing performance status of the next control period.
[0141] Step 3: Based on the future computing performance and operational performance of the computing system, obtain matching optimization strategies from the strategy library.
[0142] The future computational performance and operational performance corresponding to the optimization strategies stored in the strategy library are matched with the future computational performance and operational performance of each computing system obtained in this study. If the same or similar data exists, the match is considered successful, and the matched optimization strategy is obtained; otherwise, no matching strategy exists, and the following operations are performed:
[0143] In response to the absence of an optimization strategy in the strategy library that matches the future computing performance and operational performance of the computing system, an optimization strategy is generated using a pre-trained deep learning model. The generated optimization strategy is then used as the matching optimization strategy and stored in the strategy library.
[0144] The optimization strategies include business migration and equipment scaling, power on / off, and frequency adjustment. First, the system checks the strategy library for matching optimization strategies. If a matching strategy is found, it is directly deployed to the resource layer for execution. If not, a pre-trained deep learning model is used to generate an optimization strategy.
[0145] Among them, the optimized computing efficiency is the computing efficiency of the data center after the optimization strategy is implemented. For the specific acquisition process, please refer to S101-102.
[0146] Step 4: Control the data center to execute the matching optimization strategy and obtain the optimized computing performance of the data center after executing the matching optimization strategy;
[0147] If the optimized computing performance does not meet the preset computing performance target, the optimized computing performance will be used as the latest computing performance. The steps of S103 to determine the computing performance evaluation result of the data center and S104 to optimize the computing performance of the data center will be repeated until the optimized computing performance meets the preset computing performance target.
[0148] The preset computational efficiency target can be customized according to the design computational efficiency. For example, the preset computational efficiency target = design computational efficiency * 80%. The closer the running computational efficiency is to the design computational efficiency, the higher the computational efficiency utilization rate.
[0149] like Figure 2 The diagram shown is another flowchart of the computational efficiency evaluation method provided in this disclosure, the method including S201-S206:
[0150] S201: Based on the computing density of the server cluster in the data center, the computing system of the data center is divided into at least one sub-computing system, including but not limited to general computing system, intelligent computing system and integrated general and intelligent computing system.
[0151] Specifically, computing density refers to the computing power provided by a server per unit volume. Within the same space, for example, servers of the same volume, intelligent computing servers will certainly provide more computing power than general-purpose computing servers. Based on server computing density, data center server clusters can be divided into the following computing systems:
[0152] (1) General Computing System: Composed of general computing servers, with a central processing unit (CPU) as the main computing unit, it is a computer system used to perform general computing tasks. Optionally, the computing power precision is single-precision floating-point calculation (Flops / FP32).
[0153] (2) Intelligent Computing System: Composed of intelligent computing servers and a small number of general computing servers, with graphics processing units (GPUs, TPUs, NPUs, etc.) and artificial intelligence chips as the main computing units. It is mainly used to handle large-scale parallel computing tasks, especially in the fields of artificial intelligence and deep learning. Optionally, the computing power precision is based on half-precision floating-point computing (Flops / FP16 / BF16), suitable for artificial intelligence training clusters that include cross-node parameter plane network topology.
[0154] (3) Integrated Computing System: Composed of general computing servers and intelligent computing servers, this system is a business system consisting of training-inference-intelligent computing clusters or inference-inference-intelligent computing clusters. The main computing units are CPUs for business processing and GPUs, TPUs, and NPUs (hereinafter collectively referred to as GPUs) for AI training and inference computation. Optionally, the computing power precision is based on mixed-precision floating-point computing (Flops / (FP16 / FP32) / INT8, etc.), suitable for business systems that do not include cross-node parameter plane network topology but include AI inference applications and business systems that include integrated training-inference applications.
[0155] The above sub-computing systems are merely examples and can be specifically divided according to the actual computing systems owned by the data center.
[0156] S202: Obtain the design calculation efficiency of each sub-computing system from the design calculation efficiency database.
[0157] The calculation process for each design efficiency in the design efficiency database includes the following steps 1-3:
[0158] Step 1: Calculate the design computing power of each server based on the nominal computing power of the chips in each sub-computing system.
[0159] Design computing power refers to the computing power predicted for the server's output during the design phase, while nominal computing power refers to the nominal computing power of the CPU or GPU chip, which is the maximum computing power specified for the processor chip. Design computing power can be calculated by weighting the nominal computing power of the processor chip at the time of manufacture; for example, design computing power = nominal computing power * 60%.
[0160] Design computing power refers to the total computing power that the entire computing system possesses and can achieve, formed by determining the design scheme based on functional and performance requirements such as computing, storage, networking, functionality, performance, reliability, and security during the design phase of a computing system. It primarily estimates the maximum computing power that the computing system may reach in the future, based on factors such as hardware specifications, software architecture, and algorithm optimization. Design computing power is an important reference indicator at the initial stage of computing system design, and it has guiding significance for subsequent resource allocation and task scheduling.
[0161] Step 2: Based on the sum of the designed computing power of the servers included in each sub-computing system, obtain the designed computing power of each sub-computing system.
[0162] For the same sub-computing system, the designed computing power and the operating computing power have the same computing power accuracy, specifically:
[0163] The designed computing power of the general computing system (FP32) = ∑ the designed computing power of the general computing server (FP32);
[0164] The computing power of the intelligent computing system (FP16) = ∑ the computing power of the intelligent computing server (FP16);
[0165] The computing power of the integrated computing and intelligent computing system (FP32 / FP16) = {∑ computing power of the integrated computing server (FP32), ∑ computing power of the intelligent computing server (FP16)}.
[0166] Step 3: Based on the design computing power and design power of each sub-computing system, calculate the design computing efficiency of each sub-computing system and store the design computing efficiency in the design computing efficiency database.
[0167] For example:
[0168] The computational efficiency of the general computing system = the computational power of the general computing system / the power of the general computing system;
[0169] Intelligent computing system design efficiency = Intelligent computing system design computing power / Intelligent computing system design power;
[0170] The computational efficiency of the integrated computing system = {the computational efficiency of the integrated computing system design + the computational efficiency of the intelligent computing system design}.
[0171] S203: Calculate the force. This includes the following steps 1-3:
[0172] Step 1: Based on the historical and measurement data collected from each server, use neural network algorithms (such as MLP) to confirm the mathematical function relationship between computing power and processor utilization, and construct a general computing power model. The specific formula is as follows:
[0173]
[0174] Among them, C r1 The actual computing power output by the general computing server, i.e., the running computing power, where n is the number of CPUs and C is the total computing power. i,d U represents the nominal computing power of the CPU. i,cpu For CPU utilization, W i For CPU matching parameters, α i This represents the CPU's bias coefficient. Where W... i With α i This can be obtained by training a neural network model.
[0175]
[0176] Among them, C r2 The actual computing power output by the intelligent computing server, i.e., the running computing power, where m is the number of GPUs and C is the number of GPUs. j,d U represents the nominal computing power of a GPU. j,gpu For GPU utilization, W j For matching parameters of the GPU, β j This represents the GPU's bias coefficient. Where W... j With β j This can be obtained by training a neural network model.
[0177] For general-purpose computing servers, processor utilization is measured by CPU utilization; for intelligent computing servers, it's measured by GPU utilization. Processor utilization measures the percentage of time a processor (CPU / GPU) is occupied executing tasks within a time slice. It reflects the processor's workload and indirectly reflects the computing power usage during operation. Processor utilization can be directly collected from the cloud computing resource pool network management system in the data center. Operating computing power refers to the actual computing power output of the computing system during actual operation. It is affected by various factors, including the actual operating status of the hardware, software scheduling strategies, task load, and changes in the external environment. Operating computing power is dynamic and is an important indicator of the computing power during operation. Operating computing power needs to be obtained through real-time performance monitoring data of the computing system.
[0178] It should be noted that CPU utilization, GPU utilization, and computing performance are not linearly related, but in actual calculations, piecewise approximation or linear regression methods can be used for simulation to improve operability.
[0179] Step 2: According to the sub-computing system to which the server belongs, input the obtained processor utilization of each server into the pre-trained computing power model corresponding to each sub-computing system, and obtain the running computing power of each server according to the preset computing power accuracy of each sub-computing system.
[0180] Step 3: Calculate the computing power of each sub-computing system based on the computing power of the servers included in each sub-computing system.
[0181] The computing power of each sub-computing system is the sum of the computing power of all included servers. When calculating the total computing power, the differences between the general computing system and the intelligent computing system are significant due to their different hardware architectures and different precision units. Currently, general computing and intelligent computing are typically measured using single-precision floating-point computing power (FP32) and half-precision floating-point computing power (FP16), respectively. With the evolution of intelligent computing technology and the varying needs of applications from training to inference, future requirements may include 8-bit integer (INT8) or 4-bit integer (INT4) precision. Therefore, this disclosure evaluates the general computing system and the intelligent computing system separately, classifying and characterizing the general computing and intelligent computing capabilities of the data center. Further expansion is possible in the future as needed. The calculation formula is as follows:
[0182] The computing power of the general computing system (FP32) = ∑ computing power of the general computing server (FP32);
[0183] The computing power of the intelligent computing system (FP16) = ∑ computing power of the intelligent computing server (FP16);
[0184] The computing power of the integrated computing and intelligent computing system (FP32 / FP16) = {∑ computing power of the integrated computing server (FP32), ∑ computing power of the intelligent computing server (FP16)}.
[0185] S204: Performance Evaluation.
[0186] Based on the computing power and power consumption of each sub-computing system, the computational efficiency of each sub-computing system is calculated. For example:
[0187] The computational efficiency (CE) of the general computing system cpu / 32 = Total computing power of the general computing system / Total power of the general computing system;
[0188] The computational efficiency (CE) of intelligent computing systems gpu / 16 = Total computing power of the intelligent computing system / Total power of the intelligent computing system;
[0189] The computational efficiency (CE) of the intelligent integrated system cpu / gpu = {Total computing power of general computing system / Total power of general computing system, Total computing power of intelligent computing system / Total power of intelligent computing system}.
[0190] S205: Evaluation of computational efficiency.
[0191] Evaluate the computing performance of the data center. Specific evaluation methods include:
[0192] (1) Design calculation efficiency evaluation.
[0193] During the computing network design phase, different resource load rates are set for different business scenarios, and the design power consumption is reasonably determined based on historical operating power consumption. This maximizes the IT system installation density within a given infrastructure resource, thereby improving the design computing power efficiency. The design-state computing efficiency evaluation and optimization indicators are as follows:
[0194] 1) Space utilization rate, the formula is: Space utilization rate = Number of U-bits occupied by IT equipment / Total number of U-bits in the rack.
[0195] Space utilization is related to the number of devices installed in the rack. Low space utilization indicates a mismatch between the rack's designed power consumption and the device's designed power consumption. It is necessary to consider increasing the rack's designed power consumption or decreasing the device's designed power consumption to achieve a matching design between the two.
[0196] 2) Operating load rate, the formula is: Operating load rate = peak operating power consumption of IT equipment / design power consumption of equipment.
[0197] The operating load rate is related to the business load and the design load. A low operating load rate indicates that the design load redundancy is too large.
[0198] 3) Design power utilization rate, the formula is: Design power utilization rate = IT equipment rack power consumption / rack design installed power consumption.
[0199] The power consumption utilization rate is related to the granularity of the equipment's power consumption and the installation design. A low value indicates that there is still power consumption within the rack that is not being fully utilized. The "fragmented" power consumption can be fully utilized by following the principle of increasing the power consumption of a single rack and ensuring that the power consumption of the entire array does not exceed the design power consumption.
[0200] Considering that rack space utilization, rack design power consumption utilization, and rack operating load rate can be used to measure the utilization of infrastructure by an IT system, the design level can be evaluated from three aspects: space, operating load, and design power consumption, providing directions for design optimization and improvement. Furthermore, based on these three utilization rates, a comprehensive design utilization index can be calculated to characterize the overall design efficiency level.
[0201] Furthermore, the design-state comprehensive utilization rate index U is defined.D :
[0202] U D =W1*Rack space utilization rate +W2*Rack design power consumption utilization rate +W3*Rack operating power consumption load rate.
[0203] Among them, W1, W2, and W3 are evaluation weights, and satisfy W1 + W2 + W3 = 100%. The specific weight values can be set according to different optimization objectives. For example, considering economic costs, if space costs are lower than power costs, the weight of W1 can be reduced, while the weights of W2 and W3 can be increased proportionally to the cost. The design efficiency evaluation grading requirements and meanings are shown in Table 1.
[0204] Table 1
[0205]
[0206]
[0207] (2) Evaluation of operational efficiency.
[0208] The computing efficiency of each sub-computing system is represented in the form of an array to obtain the total computing efficiency of the data center.
[0209] The evaluation results of data center computing efficiency are determined based on the total computing efficiency of the data center.
[0210] The total computing power of a data center is specifically shown in the following examples:
[0211] Total computing power (CE) of a data center = {Computing power of the general computing system (CE)} cpu / 32 The computational efficiency (CE) of intelligent computing systems gpu / 16 The computational efficiency (CE) of the intelligent integrated system cpu / gpu )}.
[0212] During the operation of a computing network, for different business scenarios, the computing system needs to apply a series of energy efficiency optimization strategies while meeting application and business service indicators. The impact of these strategies on data center energy consumption is recorded, and changes in operational computing efficiency indicators before and after optimization are compared. The computing efficiency gain brought by each optimization strategy is analyzed; a higher gain indicates a better optimization strategy. Operational computing efficiency evaluation indicators are introduced to effectively evaluate the computing efficiency gain that different optimization strategies can bring during operation. These strategies generally include:
[0213] 1) IT equipment operating modes: Through built-in algorithms, the intelligent mode can identify task requirements and provide high-performance support when needed, while switching to energy-saving mode when the load is low, thereby maximizing energy efficiency without affecting business performance.
[0214] 2) CPU frequency: The CPU frequency is automatically adjusted according to the current workload. When the load is light, the frequency is reduced to reduce power consumption; when the load increases, the frequency is quickly increased to ensure performance and ensure that the CPU frequency is at the optimal value.
[0215] 3) Fan speed: Deploy fan control logic based on temperature sensors to gradually increase fan speed only when the hardware temperature rises, ensuring that additional power consumption is minimized while maintaining a suitable operating temperature.
[0216] Furthermore, the operational efficiency is evaluated and graded, and the evaluation and grading requirements and meanings are shown in Table 2:
[0217] Table 2
[0218]
[0219] (3) Comparative evaluation of design computational efficiency and operational computational efficiency. This includes the following steps 1-3:
[0220] Step 1: Obtain the design calculation efficiency of each sub-computing system from the design calculation efficiency database;
[0221] Step 2: Based on the ratio of the operational computing efficiency to the designed computing efficiency of each sub-computing system, obtain the effective computing efficiency of each sub-computing system;
[0222] Step 3: Based on the effective computing performance of each sub-computing system, determine the computing performance evaluation results of the data center.
[0223] It should be noted that Method 1 is suitable for comparing computing performance with other data centers, while Method 2 is suitable for evaluating the computing performance of individual sub-computing systems within a data center. The closer the operational computing performance is to the design computing performance, the higher the computing utilization rate and the higher the operational efficiency. Taking the general computing system as an example:
[0224] The computational efficiency of the general computing system = the computational power of the general computing system / the power of the general computing system;
[0225] The computing efficiency of the general computing system = the computing power of the general computing system / the power consumption of the general computing system;
[0226] Effective computational efficiency = Computational efficiency of general computing system operation / Computational efficiency of general computing system design * 100%.
[0227] Furthermore, the evaluation grading requirements for comparing design efficiency and operational efficiency are provided, as shown in Table 3:
[0228] Table 3
[0229]
[0230]
[0231] S206: Optimization of computational efficiency.
[0232] Optimize computing performance in the data center. This includes the following steps 1-4:
[0233] Step 1: Input the computing performance evaluation results and the data center's business status data into the pre-trained computing performance prediction model to obtain the future resource usage trend of the data center output by the computing performance prediction model.
[0234] The computational efficiency prediction model is constructed using a time series model.
[0235] Step 2: Based on future resource usage trends, confirm the future computing efficiency of each sub-computing system included in the data center.
[0236] The future computing performance status refers to the computing performance over a future period of time. Specifically, a control period can be set, and the duration of the control period can be customized according to actual needs, such as 1 hour. The future computing performance status can be the computing performance status of the next control period.
[0237] Step 3: Based on the future computing performance and operational performance of the computing system, obtain matching optimization strategies from the strategy library.
[0238] Specifically, the future computational efficiency and operational efficiency corresponding to the optimization strategies stored in the strategy library are compared with the future computational efficiency and operational efficiency of each computing system obtained in this study. If the same or similar data exists, the matching optimization strategy can be directly obtained; otherwise, if not, the following operations can be performed:
[0239] In response to the absence of an optimization strategy in the strategy library that matches the future computing performance and operational performance of each sub-computing system, an optimization strategy is generated using a pre-trained deep learning model. The generated optimization strategy is then used as the matching optimization strategy and stored in the strategy library.
[0240] The optimization strategies include business migration and equipment scaling, power on / off, and frequency adjustment. First, the system checks the strategy library for matching optimization strategies. If a matching strategy is found, it is directly deployed to the resource layer for execution. If not, a pre-trained deep learning model is used to generate an optimization strategy.
[0241] Among them, the optimized computing efficiency is the computing efficiency of the data center after the optimization strategy is implemented. For the specific acquisition process, please refer to S201-204.
[0242] Step 4: Control the resource layer of the data center to execute the optimization strategy, obtain the optimized computing performance after the data center executes the optimization strategy, and if the optimized computing performance does not meet the preset computing performance target, take the optimized computing performance as the latest computing performance and repeat steps S205-206 until the optimized computing performance meets the preset computing performance target.
[0243] The preset computational efficiency target can be customized according to the design computational efficiency. For example, the preset computational efficiency target = design computational efficiency * 80%. The closer the running computational efficiency is to the design computational efficiency, the higher the computational efficiency utilization rate.
[0244] According to another aspect of the present disclosure, a computing performance evaluation system is provided, comprising a server cluster and a cloud management platform, wherein the cloud management platform evaluates the computing performance of the server cluster according to the method described above. (Reference) Figure 3 The architecture diagram shown and Figure 4 The flowchart shown provides a detailed description of each module included in this system:
[0245] 1) Calculate the design computing efficiency based on the design computing power (including the general computing system, intelligent computing system, and integrated general and intelligent computing system) and design power during the construction period, and store all detailed data in the design computing efficiency database. The stored data must meet all requirements for subsequent computing efficiency evaluation, including but not limited to the dimensions and types of computing efficiency evaluation.
[0246] 2) The data acquisition module collects business status, CPU / GPU utilization, power consumption or power data in the data center in real time and uploads them to the management and control layer.
[0247] 3) The control layer receives data from the data acquisition module and uploads the data to the business perception module, computing power module, and power analysis module (instantaneous power or power calculated from power consumption over a time period) to calculate the total operating computing power and total operating power in the data center. The operating computing efficiency calculation module further calculates the operating computing efficiency.
[0248] 4) The computational efficiency evaluation module makes an objective evaluation of the computational efficiency of the data center based on the calculated operational computational efficiency and design computational efficiency (i.e., the data in the design computational efficiency database), which is the computational efficiency evaluation result. Specifically, it refers to the two computational efficiency evaluation methods mentioned in the embodiments of this disclosure.
[0249] 5) The computing efficiency prediction module uses time series models such as LSTM and GRU to predict the resource usage trend of the data center based on computing efficiency evaluation results and business status data (including deployment targets, etc.). It predicts and outputs the future computing efficiency status of the data center in the next control cycle in advance, and then determines whether the current system needs to be optimized based on the business deployment cycle. The control cycle can be customized according to actual needs, for example, 1 hour as a control cycle.
[0250] 6) The optimization strategy generation module, based on the received future computing performance status and operational computing performance, first checks the strategy library for suitable optimization strategies. If a suitable strategy is found, it is directly issued to the resource layer for execution. If not, it generates an optimization strategy using a pre-trained deep learning model, verifies its feasibility, adds the strategy to the strategy library, and then issues it to the resource layer for execution. Optimization strategies generally include service migration and equipment scaling, power on / off, and frequency adjustment. It is important to note that this process requires the service awareness module to collect service status information, combine it with computing performance evaluation results to predict computing performance, support closed-loop optimization control of system computing performance, and finally generate and execute the optimization strategy, completing a closed-loop process.
[0251] 7) Re-collect computing power, power, and service data, calculate the operational efficiency after the optimization strategy is implemented, and feed the execution results back to the computing efficiency evaluation module. If the computing efficiency design target is not met, return to 2); if it is met, end the closed-loop optimization process and enter the next control cycle.
[0252] The computational efficiency evaluation system and the computational efficiency evaluation method provided in this disclosure are based on the same inventive concept and have the same beneficial effects as the methods they adopt, operate or implement.
[0253] According to another aspect of the embodiments of this disclosure, a computational efficiency evaluation device is provided, such as... Figure 5 As shown, the device includes:
[0254] The first computing module 101 is used to calculate the computing power of the computing system based on the processor utilization of the servers included in the computing system of the data center. The computing power is the computing power output by the computing system in the running state.
[0255] The second calculation module 102 is used to calculate the operating efficiency of the computing system based on the computing power and power of the computing system.
[0256] Evaluation module 103 is used to determine the computing efficiency evaluation result of the data center based on the operational computing efficiency of the computing system; or, to obtain the design computing efficiency of the computing system and determine the computing efficiency evaluation result of the data center based on the design computing efficiency and operational computing efficiency of the computing system.
[0257] The computing performance evaluation device is further configured to: after determining the computing performance evaluation result of the data center, include:
[0258] The computing performance evaluation results and the business status data of the data center are input into a pre-trained computing performance prediction model to obtain the future resource usage trend of the data center output by the computing performance prediction model; wherein, the computing performance prediction model is constructed using a time series model;
[0259] Based on the aforementioned future resource usage trends, the future computing efficiency of the computing systems included in the data center is confirmed.
[0260] Based on the future computing performance and operational computing performance of the computing system, a matching optimization strategy is obtained from the strategy library;
[0261] In response to the absence of an optimization strategy in the strategy library that matches the future computing performance and operational performance of the computing system, an optimization strategy is generated using a pre-trained deep learning model, and the generated optimization strategy is used as the matching optimization strategy and stored in the strategy library.
[0262] The computational efficiency evaluation device is further configured to: after obtaining a matching optimization strategy, include:
[0263] Control the data center to execute the matched optimization strategy, and obtain the optimized computing performance of the data center after executing the matched optimization strategy;
[0264] If the optimized computing performance does not meet the preset computing performance target, the optimized computing performance is taken as the latest computing performance, and the steps of determining the computing performance evaluation result of the data center and obtaining the matching optimization strategy are repeated until the optimized computing performance meets the preset computing performance target.
[0265] In one or more embodiments, the first computing module 101 is used for:
[0266] The obtained processor utilization of the server is input into the pre-trained computing power model corresponding to the computing system. Based on the computing power precision preset by the computing system, the running computing power of the server is obtained. The computing power model uses a neural network algorithm and is constructed based on the mathematical function relationship between running computing power and processor utilization.
[0267] The computing power of the computing system is obtained by summing the computing power of all the servers included in the computing system.
[0268] In one or more embodiments, the evaluation module 103 is used to:
[0269] When the computing system includes at least one sub-computing system, the computing efficiency of the multiple sub-computing systems is represented in the form of an array to obtain the total computing efficiency of the data center;
[0270] Based on the total computing efficiency of the data center, the computing efficiency evaluation result of the data center is determined.
[0271] In one or more embodiments, the evaluation module 103 is used to:
[0272] The effective computing efficiency of the computing system is obtained based on the ratio of its operational computing efficiency to its designed computing efficiency.
[0273] Based on the effective computing efficiency of the computing system, the computing efficiency evaluation result of the data center is determined.
[0274] In one or more embodiments, the evaluation module 103 is used to:
[0275] Based on the nominal computing power of the server chips included in the computing system, the design computing power of the server is calculated, wherein the design computing power is the computing power predicted to be output by the server during the design phase.
[0276] The design computing power of each computing system is obtained based on the sum of the design computing power of the servers included in the computing system.
[0277] Based on the design computing power and design power of the computing system, the design computing efficiency of the computing system is calculated.
[0278] The computational efficiency evaluation device and the computational efficiency evaluation method provided in this disclosure are based on the same inventive concept and have the same beneficial effects as the methods they adopt, operate or implement.
[0279] This disclosure also provides a computer device for performing the above-described computational efficiency evaluation method. Please refer to... Figure 6 It illustrates a schematic diagram of a computer device provided by some embodiments of this disclosure. For example... Figure 6 As shown, the computer device 8 includes: a processor 800, a memory 801, a bus 802, and a communication interface 803. The processor 800, the communication interface 803, and the memory 801 are connected via the bus 802. The memory 801 stores a computer program that can run on the processor 800. When the processor 800 runs the computer program, it executes the computational efficiency evaluation method provided in any of the foregoing embodiments of this disclosure.
[0280] The memory 801 may include high-speed random access memory (RAM) or non-volatile memory, such as at least one disk storage device. Communication between this device network element and at least one other network element is achieved through at least one communication interface 803 (which can be wired or wireless), such as the Internet, wide area network, local area network, metropolitan area network, etc.
[0281] Bus 802 can be an ISA bus, PCI bus, or EISA bus, etc. The bus can be divided into an address bus, a data bus, a control bus, etc. The memory 801 is used to store programs. After receiving an execution instruction, the processor 800 executes the program. The computational efficiency evaluation method disclosed in any of the foregoing embodiments of this disclosure can be applied to the processor 800, or implemented by the processor 800.
[0282] The processor 800 may be an integrated circuit chip with signal processing capabilities. In implementation, each step of the above method can be completed by the integrated logic circuitry in the hardware of the processor 800 or by instructions in software form. The processor 800 may be a general-purpose processor, including a central processing unit (CPU), a network processor (NP), etc.; it may also be a digital signal processor (DSP), an application-specific integrated circuit (ASIC), an off-the-shelf programmable gate array (FPTA), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components. It can implement or execute the methods, steps, and logic block diagrams disclosed in the embodiments of this disclosure. The general-purpose processor may be a microprocessor or any conventional processor. The steps of the methods disclosed in the embodiments of this disclosure can be directly embodied in the execution of a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor. The software modules may reside in random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, registers, or other mature storage media in the art. The storage medium is located in memory 801. Processor 800 reads the information in memory 801 and, in conjunction with its hardware, completes the steps of the above method.
[0283] The computer device and the computational efficiency evaluation method provided in this disclosure are based on the same inventive concept and have the same beneficial effects as the methods they employ, operate, or implement.
[0284] This disclosure also provides a computer-readable storage medium corresponding to the computational efficiency evaluation method provided in the foregoing embodiments. The computer-readable storage medium is an optical disc, on which a computer program (i.e., a computer program product) is stored. When the computer program is run by a processor, it executes the computational efficiency evaluation method provided in any of the foregoing embodiments.
[0285] It should be noted that examples of the computer-readable storage medium may also include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other optical and magnetic storage media, which will not be elaborated here.
[0286] The computer-readable storage medium provided in the above embodiments of this disclosure and the computational efficiency evaluation method provided in the embodiments of this disclosure are based on the same inventive concept and have the same beneficial effects as the methods adopted, run or implemented by the applications stored therein.
[0287] This disclosure also provides a computer program product; please refer to [reference needed]. Figure 7 The computer program product 600 carries program code, namely computer program 601. The instructions included in the computer program 601 can be used to execute the steps of the computational efficiency evaluation method described in the above method embodiments. For details, please refer to the above method embodiments, which will not be repeated here.
[0288] The aforementioned computer program product can be implemented through hardware, software, or a combination thereof. In one optional embodiment, the computer program product is specifically embodied in a computer storage medium; in another optional embodiment, the computer program product is specifically embodied in a software product, such as a software development kit (SDK), etc.
[0289] The basic principles of this disclosure have been described above with reference to specific embodiments. However, it should be noted that the advantages, benefits, and effects mentioned in this disclosure are merely examples and not limitations, and should not be considered as essential features of each embodiment of this disclosure. Furthermore, the specific details disclosed above are for illustrative and facilitative purposes only, and are not limitations. These details do not limit the scope of this disclosure to the necessity of employing the aforementioned specific details for implementation.
[0290] The block diagrams of devices, apparatuses, devices, and systems disclosed herein are merely illustrative examples and are not intended to require or imply that they must be connected, arranged, or configured in the manner shown in the block diagrams. As those skilled in the art will recognize, these devices, apparatuses, devices, and systems can be connected, arranged, and configured in any manner. Words such as “comprising,” “including,” “having,” etc., are open-ended terms meaning “including but not limited to,” and are used interchangeably with them. The terms “or” and “and” as used herein refer to the terms “and / or,” and are used interchangeably with them unless the context clearly indicates otherwise. The term “such as” as used herein refers to the phrase “such as but not limited to,” and is used interchangeably with it.
[0291] Additionally, as used herein, the “or” used in a list of items beginning with “at least one” indicates a separate list, such that a list of, for example, “at least one of A, B, or C” means A or B or C, or AB or AC or BC, or ABC (i.e., A and B and C). Furthermore, the word “exemplary” does not imply that the described example is preferred or better than other examples.
[0292] It should also be noted that in the systems and methods of this disclosure, the components or steps can be decomposed and / or recombined. These decompositions and / or recombinations should be considered as equivalent solutions to this disclosure.
[0293] Various changes, substitutions, and modifications can be made to the technology described herein without departing from the teachings defined by the appended claims. Furthermore, the scope of the claims of this disclosure is not limited to the specific aspects of the processes, machines, manufactures, events, means, methods, and actions described above. Currently existing or later-developed processes, machines, manufactures, events, means, methods, or actions that perform substantially the same function or achieve substantially the same result as the corresponding aspects described herein can be utilized. Therefore, the appended claims include such processes, machines, manufactures, events, means, methods, or actions within their scope.
[0294] The above description of the disclosed aspects is provided to enable any person skilled in the art to make or use this disclosure. Various modifications to these aspects will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to other aspects without departing from the scope of this disclosure. Therefore, this disclosure is not intended to be limited to the aspects shown herein, but rather to be carried out within the widest scope consistent with the principles and novel features disclosed herein.
[0295] The above description has been given for purposes of illustration and description. Furthermore, this description is not intended to limit the embodiments of this disclosure to the forms disclosed herein. Although numerous exemplary aspects and embodiments have been discussed above, those skilled in the art will recognize certain variations, modifications, alterations, additions, and sub-combinations therein.
Claims
1. A computational efficiency evaluation method, characterized in that, include: The computing power of the computing system is calculated based on the processor utilization of the servers in the data center computing system. The computing power is the computing power output by the computing system in the running state. Based on the computing power and power consumption of the computing system, the computing efficiency of the computing system is calculated. Based on the operational computing efficiency of the computing system, determine the computing efficiency evaluation result of the data center; or, obtain the design computing efficiency of the computing system, and based on the design computing efficiency and operational computing efficiency of the computing system, determine the computing efficiency evaluation result of the data center.
2. The computational efficiency evaluation method as described in claim 1, characterized in that, After determining the computing performance evaluation results of the data center, the following steps are also included: The computing performance evaluation results and the business status data of the data center are input into a pre-trained computing performance prediction model to obtain the future resource usage trend of the data center output by the computing performance prediction model; wherein, the computing performance prediction model is constructed using a time series model; Based on the aforementioned future resource usage trends, the future computing efficiency of the computing systems included in the data center is confirmed. Based on the future computing performance and operational computing performance of the computing system, a matching optimization strategy is obtained from the strategy library; In response to the absence of an optimization strategy in the strategy library that matches the future computing performance and operational performance of the computing system, an optimization strategy is generated using a pre-trained deep learning model, and the generated optimization strategy is used as the matching optimization strategy and stored in the strategy library.
3. The computational efficiency evaluation method as described in claim 2, characterized in that, After obtaining the matching optimization strategy, the following is also included: Control the data center to execute the matched optimization strategy, and obtain the optimized computing performance of the data center after executing the matched optimization strategy; If the optimized computing performance does not meet the preset computing performance target, the optimized computing performance is taken as the latest computing performance, and the steps of determining the computing performance evaluation result of the data center and obtaining the matching optimization strategy are repeated until the optimized computing performance meets the preset computing performance target.
4. The computational efficiency evaluation method as described in claim 1, characterized in that, The computing power of a data center computing system is calculated based on the processor utilization of its servers, including: The obtained processor utilization of the server is input into the pre-trained computing power model corresponding to the computing system. Based on the computing power precision preset by the computing system, the running computing power of the server is obtained. The computing power model uses a neural network algorithm and is constructed based on the mathematical function relationship between running computing power and processor utilization. The computing power of the computing system is obtained by summing the computing power of all the servers included in the computing system.
5. The computational efficiency evaluation method as described in claim 1, characterized in that, Based on the computing efficiency of the computing system, the computing efficiency evaluation results of the data center are determined, including: When the computing system includes at least one sub-computing system, the computing efficiency of the multiple sub-computing systems is represented in the form of an array to obtain the total computing efficiency of the data center; Based on the total computing efficiency of the data center, determine the computing efficiency evaluation result of the data center; Based on the design and operational computational efficiency of the computing system, the computational efficiency evaluation results of the data center are determined, including: The effective computing efficiency of the computing system is obtained based on the ratio of its operational computing efficiency to its designed computing efficiency. Based on the effective computing efficiency of the computing system, the computing efficiency evaluation result of the data center is determined.
6. The computational efficiency evaluation method as described in claim 1, characterized in that, Obtaining the design computational efficiency of the computing system includes: Based on the nominal computing power of the server chips included in the computing system, the design computing power of the server is calculated, wherein the design computing power is the computing power predicted to be output by the server during the design phase. The design computing power of each computing system is obtained based on the sum of the design computing power of the servers included in the computing system. Based on the design computing power and design power of the computing system, the design computing efficiency of the computing system is calculated.
7. The computational efficiency evaluation method as described in any one of claims 1-6, characterized in that, The computing system includes at least one sub-computing system, which is divided according to the computing density of the server cluster in the data center; calculating the computing power of the computing system includes calculating the computing power of each of the sub-computing systems. The calculation of the computing system's operational efficiency includes calculating the operational efficiency of each of the sub-computing systems. Based on the computing efficiency of the computing system, the computing efficiency evaluation results of the data center are determined, including: Based on the operational computing efficiency of each of the sub-computing systems, the computing efficiency evaluation result of the data center is determined; Based on the design and operational computational efficiency of the computing system, the computational efficiency evaluation results of the data center are determined, including: Based on the design and operational computational efficiency of each of the sub-computing systems, the computational efficiency evaluation results of each of the sub-computing systems are determined. Based on the computational efficiency evaluation results of each of the sub-computing systems, the computational efficiency evaluation results of the data center are determined.
8. A computational efficiency evaluation system, characterized in that, This includes server clusters and cloud management platforms; The cloud management platform performs computational efficiency evaluation on the server cluster according to any one of claims 1-7.
9. A computer device, comprising a memory, a processor, and a computer program stored in the memory, characterized in that, The processor executes the computer program to implement the method according to any one of claims 1 to 7.
10. A non-transient computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the method described in any one of claims 1 to 7.