An adaptive DVFS method and apparatus for optimizing energy efficiency of deep neural networks
By extracting global features and predicting cluster hyperparameters, the power blocks of DNNs are identified and the frequency is adjusted adaptively. This solves the problem of poor energy efficiency optimization in DNNs by the traditional DVFS strategy, and achieves more accurate frequency adjustment and energy efficiency improvement.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- UNIV OF SCI & TECH OF CHINA
- Filing Date
- 2023-12-13
- Publication Date
- 2026-06-23
AI Technical Summary
Traditional dynamic voltage and frequency scaling (DVFS) strategies struggle to adapt to dynamic computing demands in real time and accurately in deep neural networks (DNNs), resulting in poor energy efficiency optimization and challenges when migrating and optimizing across different hardware platforms.
By extracting global features and predicting cluster hyperparameters, power blocks of DNNs are identified and DVFS instrumentation points are set. Mahalanobis distance mapping and clustering algorithms are used to divide the power consumption view and adaptively adjust the frequency to optimize energy efficiency.
It achieves more precise frequency adjustment, avoids frequency ping-pong phenomenon, ensures stable and efficient operation on different hardware platforms, and improves energy efficiency.
Smart Images

Figure CN117875394B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the technical field of deep neural network optimization, and more particularly to an adaptive DVFS method and apparatus for optimizing the energy efficiency of deep neural networks. Background Technology
[0002] The rapid development and widespread application of deep neural networks (DNNs) have brought significant technological advancements to numerous fields, such as image processing and natural language processing. As DNN model structures become increasingly complex, their computational demands and data processing volumes grow exponentially. This trend makes energy efficiency a critical consideration when deploying DNNs. Running these complex models on a large scale leads to enormous energy consumption, resulting in economic costs and security issues.
[0003] Dynamic Voltage and Frequency Scaling (DVFS) technology has attracted increasing attention. By adjusting the processor's voltage and frequency, DVFS can significantly reduce power consumption without altering the model structure. However, applying this technology to Dynamic Neural Networks (DNNs) still faces challenges because DNNs encounter different operators during runtime, each with varying energy efficiency requirements and performance targets. Appropriately formulated DVFS application strategies can dynamically adjust the DVFS configuration during DNN operation, i.e., by adjusting the processor frequency to balance the energy consumption and performance of different components within the DNN, thereby optimizing overall energy efficiency. However, when dealing with the computational demands of DNNs, traditional DVFS strategies face challenges such as response lag and frequency fluctuations (i.e., the "frequency ping-pong" phenomenon). These strategies typically rely on historical data (such as processor utilization and task computational load) and heuristic rules, often failing to adapt accurately to the dynamic changes during DNN operation in real time. This leads to a mismatch between computational demands and frequency adjustments, resulting in poor energy efficiency optimization. Furthermore, differences in performance characteristics and utilization metrics across different hardware platforms pose significant challenges to the migration and optimization of traditional DVFS strategies. Summary of the Invention
[0004] This invention provides an adaptive DVFS method and apparatus for optimizing the energy efficiency of deep neural networks, so as to improve the energy efficiency of DNNs and ensure their efficient operation in diverse computing environments.
[0005] In a first aspect, embodiments of the present invention provide an adaptive DVFS method for optimizing the energy efficiency of deep neural networks, comprising:
[0006] S1. Extract global features of deep neural networks (DNNs) using a global feature extractor and input them into a clustering hyperparameter prediction model to predict the clustering hyperparameters of the current DNNs using the clustering hyperparameter prediction model.
[0007] S2. Based on the predicted clustering hyperparameters and the fine-grained deep features of DNNs extracted by the deep feature extractor, power consumption behavior similarity clustering is performed to divide the DNNs into multiple power consumption blocks and form a power consumption view. Then, the global features of each power consumption block are input into the decision model to obtain the target decision frequency of each power consumption block.
[0008] S3. Set DVFS instrumentation points before each power block of DNNs, and pre-set the target frequency in each power block of DNNs according to the target decision frequency.
[0009] Optionally, S2 specifically includes:
[0010] The power distance between the Mahalanobis distance mapping with introduced distance regularization term and the quantization operator is adopted;
[0011] Based on the power distance, the DNNs are divided into multiple power blocks using the neighborhood radius and minimum number of points in the cluster hyperparameter of the clustering algorithm to form a power view of the DNNs.
[0012] The global features of each power block are extracted by a global feature extractor and used as input to the decision model, so that the target decision frequency of each power block is output by the decision model.
[0013] Optionally, the deep feature extractor is used to extract general features and specific features of DNNs.
[0014] Optionally, the global feature extractor is used to extract the macroscopic structural features of DNNs and to perform feature statistics and aggregation.
[0015] Optionally, during the training of the clustering hyperparameter prediction model, the structural features in the global features are used as the input in the initial stage of model training, and the statistical features in the global features are used as the input in the middle stage of model training.
[0016] In a second aspect, the present invention also provides an adaptive DVFS device for optimizing the energy efficiency of deep neural networks, comprising:
[0017] The clustering hyperparameter prediction module is used to extract global features of deep neural networks (DNNs) through a global feature extractor and input them into the clustering hyperparameter prediction model, so as to predict the clustering hyperparameters of the current DNNs through the clustering hyperparameter prediction model.
[0018] The target frequency decision module is used to perform power consumption behavior similarity clustering based on the predicted clustering hyperparameters and the fine-grained deep features of DNNs extracted by the deep feature extractor, so as to divide the DNNs into multiple power consumption blocks and form a power consumption view. Then, the global features of each power consumption block are input into the decision model to obtain the target decision frequency of each power consumption block.
[0019] The target frequency preset module is used to set DVFS instrumentation points before each power block of the DNNs, and to preset the target frequency in each power block within the DNNs according to the target decision frequency.
[0020] Optionally, the target frequency decision module is specifically used for:
[0021] The power distance between the Mahalanobis distance mapping with introduced distance regularization term and the quantization operator is adopted;
[0022] Based on the power distance, the DNNs are divided into multiple power blocks using the neighborhood radius and minimum number of points in the cluster hyperparameter of the clustering algorithm to form a power view of the DNNs.
[0023] The global features of each power block are extracted by a global feature extractor and used as input to the decision model, so that the target decision frequency of each power block is output by the decision model.
[0024] The beneficial effects of this invention are:
[0025] (1) This invention proposes a power-sensitive feature extraction method and performs clustering based on power consumption behavior similarity. This method can accurately identify key power blocks in DNNs and use power blocks as basic units of DVFS, thereby preventing frequency ping-pong. In addition, this method can effectively alleviate the frequency modulation lag problem by presetting DVFS staging points before each block and completing frequency adjustment at the staging points, thereby achieving more accurate frequency adjustment.
[0026] (2) The present invention uses a decision model to predict and adaptively preset the target frequency of each block, which solves the problem of frequency adjustment accuracy and ensures that the frequency setting meets the different performance and power consumption requirements of each power block.
[0027] (3) The technical solution of the present invention does not require manual intervention and can ensure its stable and efficient operation on different hardware platforms. Attached Figure Description
[0028] Figure 1 This is an overall framework diagram of an adaptive DVFS method for optimizing the energy efficiency of deep neural networks, provided by an embodiment of the present invention.
[0029] Figure 2 This is a structural diagram of the clustering hyperparameter prediction model provided in an embodiment of the present invention;
[0030] Figure 3 A structural diagram of the decision model provided in an embodiment of the present invention;
[0031] Figure 4The experimental results are provided in the actual task flow of the embodiments of the present invention. Detailed Implementation
[0032] The present invention will now be described in further detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and not intended to limit it. Furthermore, it should be noted that, for ease of description, the accompanying drawings show only the parts relevant to the present invention, and not all of the structures.
[0033] Figure 1 This invention provides an overall framework diagram for an adaptive DVFS method to optimize the energy efficiency of deep neural networks, which covers two core aspects. The first aspect... Figure 1 The framework's workflow and basic modules are illustrated, including a power-sensitive feature extraction method, power behavior similarity-based clustering, and adaptive DVFS decision-making. These methods are connected through an intermediate representation. Secondly, the model training phase comprises a unified process from dataset generation to model training. This process produces two key predictive models: one for predicting clustering hyperparameters to select appropriate clustering schemes for different DNNs; and a decision model for determining the target frequency for DVFS of each power block. Finally, embodiments of the invention also systematically analyze the offline and runtime overhead introduced by this framework.
[0034] Specifically, the adaptive DVFS method for optimizing the energy efficiency of deep neural networks provided by this invention includes the following steps:
[0035] S1. Extract global features of deep neural networks (DNNs) using a global feature extractor and input them into a clustering hyperparameter prediction model to predict the clustering hyperparameters of the current DNNs using the clustering hyperparameter prediction model.
[0036] S2. Based on the predicted clustering hyperparameters and the fine-grained deep features of DNNs extracted by the deep feature extractor, power consumption behavior similarity clustering is performed to divide the DNNs into multiple power consumption blocks and form a power consumption view. Then, the global features of each power consumption block are input into the decision model to obtain the target decision frequency of each power consumption block.
[0037] The above S2 specifically includes: power distance between Mahalanobis distance mapping with introduced distance regularization term and quantization operator;
[0038] Based on the power distance, the DNNs are divided into multiple power blocks using the neighborhood radius and minimum number of points in the cluster hyperparameter of the clustering algorithm to form a power view of the DNNs.
[0039] The global features of each power block are extracted by a global feature extractor and used as input to the decision model, so that the target decision frequency of each power block is output by the decision model.
[0040] S3. Set DVFS instrumentation points before each power block of DNNs, and pre-set the target frequency in each power block of DNNs according to the target decision frequency.
[0041] The method in this embodiment performs pre-analysis for DNN tasks, including power-sensitive feature extraction methods, clustering based on power behavior similarity, and adaptive DVFS decision-making. These methods are connected through intermediate representations to achieve adaptive DVFS to optimize the energy efficiency of DNNs. The following sections will further explain and illustrate the above steps and related methods.
[0042] I. Power-Sensitive Feature Extraction Method
[0043] In this embodiment, a power-sensitive feature extraction method based on mixed granularity is used for feature extraction.
[0044] Specifically, this embodiment employs two complementary methods: a deep feature extractor that delves into each layer of the network to extract fine-grained hierarchical features; and a global feature extractor that extracts coarse-grained macroscopic features from a global perspective. These features are then effectively expressed and fused to form an intermediate representation, constructing a comprehensive representation that accurately reflects the power consumption characteristics of deep neural networks (DNNs), laying a solid foundation for subsequent clustering and prediction.
[0045] The deep feature extractor is used to extract general and specific features of DNNs.
[0046] Specifically, ① General Feature Extraction: For all basic components in the network, this invention will extract general features reflecting their computational and storage requirements, such as computational load, parameter count, and memory access volume. These features are applicable to various operators. ② Specific Feature Extraction: For different types of components, this invention will extract their unique structural parameters. For example, for convolutional layers, it will extract kernel size, number, and stride; for the Transformer module, this invention needs to extract structural features such as matrix parameters, number of attention heads, fully connected layer parameters, normalized parameters, and positional encoding methods in the attention mechanism to accurately describe the computational patterns and resource requirements of its internal components, thereby representing the operator's unique power consumption-related characteristics. This reflects the specific computational patterns of different components.
[0047] Furthermore, this embodiment also requires feature integration and abstraction at the entire DNN or block level, aiming to encapsulate global power consumption features. Incorporating global features helps avoid misrepresentation that may arise due to feature locality.
[0048] The global feature extractor is used to extract the macroscopic structural features of DNNs and to perform feature statistics and aggregation.
[0049] Specifically, ① Macroscopic structural features are extracted: This includes analyzing the macroscopic parameters of DNNs, such as the number of layers, depth, type, number of residual connections, and branch structure. These parameters are used to illustrate the overall scale and complexity of the topology, providing insights into global power consumption patterns. ② Feature statistics and aggregation: This involves aggregating all fine-grained features to generate a comprehensive summary, including the total FLOPs and total number of parameters of the computational network. Furthermore, the proportions of each component can be analyzed to elucidate computational patterns.
[0050] II. Clustering Methods Based on Power Consumption Feature Similarity
[0051] Step S2 in this embodiment employs a clustering method based on power consumption behavior similarity to classify operators according to their power consumption characteristics within the network. This process not only maps and quantifies the power consumption distance between operators but also divides different power consumption blocks based on this mapping. The power consumption view composed of these power consumption blocks serves as a logical intermediate representation, intuitively showing the main paths and regions of power consumption within DNNs. This clustering is crucial for subsequent energy efficiency optimization because it enables the present invention to perform more effective DVFS instrumentation for operator groups with similar power consumption characteristics.
[0052] Considering that different features may have different scales and dimensions in a multidimensional feature space, Mahalanobis distance naturally adjusts the scale of these features through the covariance matrix. Therefore, this embodiment uses Mahalanobis distance to quantify the similarity of power consumption behavior in the feature space. Next, by introducing a distance regularization term between operators in the distance calculation, it can be ensured that only physically adjacent operators are considered, avoiding the erroneous clustering of operators with similar but non-adjacent features together. Then, based on the neighborhood radius (epsilon) and minimum number of points in the cluster (minPts) hyperparameters of the DBSCAN algorithm, the network is divided into multiple power consumption blocks to form a power consumption view. This process not only simplifies the network structure but also provides an intuitive and effective basis for making DVFS target frequency decisions, greatly enhancing the operability and accuracy of adaptive DVFS.
[0053] Finally, post-processing of the clustering results primarily focuses on adjusting and optimizing the clustering outcomes to ensure that the generated power blocks are continuous and feasible within the network's hierarchical structure. Post-processing not only addresses outliers to ensure each block is properly processed but also involves adjusting the size, shape, or membership of clusters to achieve a better view of power consumption.
[0054] Specifically, the algorithm for clustering based on power consumption behavior similarity includes the following steps:
[0055] enter:
[0056] - Scaled power-sensitive depth feature X;
[0057] - Clustering hyperparameters: neighborhood radius ε, minimum number of points in the cluster minPts; parameters α and λ used for distance calculation.
[0058] Output:
[0059] - Power consumption view.
[0060] process:
[0061] 1. n ← the number of layers in X
[0062] 2. Calculate the covariance matrix C of X.
[0063] 3. The pseudo-inverse of P←C
[0064] 4. Initialize the distance matrix Zero matrix
[0065] 5. For each pair of layers i,j in X
[0066]
[0067] Note: This is Mahalanobis distance.
[0068] 6. Initialize the operation interval regularization matrix Zero matrix
[0069] 7. For each pair of layers i,j in X:
[0070] -R[i,j]←exp(-λ·|ij|)
[0071] Note: This is the determination of operator distance.
[0072] 8.Distance←α·D[i,j]+(1-α)·R[i,j]
[0073] 9. clusters ← Use DBSCAN on Distance, with parameters ε and minPts
[0074] 10. newClusters ← Process clusters to ensure that the interiors of clusters are continuous and non-overlapping.
[0075] Returns: newClusters.
[0076] This embodiment introduces clustering based on an understanding of basic power consumption behavior and relationships to enhance the understanding of DNN power consumption patterns. This clustering facilitates the implementation of adaptive DVFS instrumentation granularity.
[0077] III. Adaptive Target Frequency Decision
[0078] This invention utilizes a power consumption view constructed through clustering to identify key power consumption blocks, guiding precise energy efficiency control. In this stage, based on the established power consumption view, DVFS instrumentation points are set before each block in the DNNs. During this process, a target frequency is adaptively preset based on the characteristics and requirements of each power consumption block. Specifically, this invention employs a decision model whose input is the global features of each power consumption block, and whose output is the target frequency for that block. For example, in the case of compute-intensive blocks, the decision model will choose to increase the target frequency to alleviate computational load. Conversely, for memory-intensive blocks, considering energy efficiency requirements, it will decide to decrease the frequency. Then, during the inference process of the DNNs, adaptive DVFS is performed according to the preset instrumentation points and target frequency, thereby improving energy efficiency.
[0079] IV. Model Training Phase
[0080] In this embodiment, the trained models include the training of a clustering hyperparameter prediction model and the training of a decision model.
[0081] Specifically, model training includes multiple steps such as dataset generation and model training. In this stage, a key dataset for prediction is constructed through a data generator and training process, and a clustering hyperparameter prediction model and a decision model are trained.
[0082] like Figure 1 As shown, the dataset generator produces two datasets. First, it uses a deep neural network (DNN) generator to generate various neural networks by randomly combining their features. These networks are then processed by clustering algorithms with different hyperparameters to form corresponding power consumption views. Each block in the power consumption view is deployed across all frequencies to select test data that achieves optimal energy efficiency. Finally, two datasets are output: dataset A contains the global features of the neural networks and their corresponding clustering hyperparameters; dataset B includes the global features of each block and its optimal frequency.
[0083] The clustering hyperparameter prediction model is trained using dataset A, and its network structure is as follows: Figure 2 As shown, the global features of a DNN are divided into macroscopic structural features and statistical features. Structural features are input in the initial stage of the model to establish a basic understanding of the DNN's structure. Statistical features are input in the middle stage of the network to further improve prediction accuracy based on the existing structural understanding.
[0084] The decision model is trained using dataset B, and its network structure is as follows: Figure 3As shown, it predicts the target frequency setting based on the global characteristics of each power block. This is essentially a classification task involving choosing between several frequency levels supported by the hardware. Furthermore, even with prediction bias, the predicted target frequency deviates from the actual optimal frequency by only one or two levels, thus not significantly affecting overall performance.
[0085] V. Framework Overhead Analysis
[0086] Offline Overhead Analysis: The entire framework operates in an offline environment, with its main offline overhead stemming from model training and the workflow. Training the clustering hyperparameter prediction model and the decision model requires data collection and training until convergence. Furthermore, this approach eliminates the need for manual intervention, as migrating it to a new hardware platform involves only automatically generating the dataset and training. The workflow overhead primarily involves feature extraction, hyperparameter prediction, clustering, predicting target frequencies, and generating a power consumption view. These steps are all completed before inference, thus not impacting runtime performance.
[0087] Runtime overhead analysis: The runtime overhead brought by the framework mainly comes from two aspects:
[0088] ①DVFS instructions consume processor resources, thus incurring overhead. However, the framework implements adaptive DVFS instrumentation granularity through clustering, thus targeting only critical power blocks and reducing performance degradation caused by frequent frequency adjustments. Therefore, this precise control effectively manages and balances this overhead, ensuring it does not significantly impact overall runtime performance.
[0089] ② The framework adaptively sets the frequency for each block, including frequency reduction operations for some blocks. Frequency reduction may increase the time overhead of these blocks. However, this performance sacrifice is to achieve a higher energy efficiency ratio, i.e., reducing overall energy consumption by reducing power consumption. This trade-off is generally worthwhile because it maintains or improves overall energy efficiency while sacrificing only a small fraction of performance. Importantly, this approach does not change the structure or computational intensity of the neural network, and therefore does not affect the accuracy of inference.
[0090] Experimental verification
[0091] To verify the effectiveness of the proposed method for energy efficiency optimization in real-world scenarios, the experiment was conducted on two platforms: NVIDIA Jetson AGX and NVIDIA Jetson TX2. Necessary runtime environments were installed, including Jetpack 4.6.2, Ubuntu 16.04, CUDA 10.2, torchvision 0.12, and PyTorch 1.12.0. The MAXN operating mode was configured on both the AGX and TX2. Different GPU frequencies and batch sizes were configured on the two platforms: on the AGX, the frequency range was from 114MHz to 1370MHz, with 14 levels; on the TX2, the frequency range was from 114MHz to 1300MHz, with 13 levels. For energy efficiency calculations, the Jetson platform's performance management tool (tegrastats) was used to monitor power consumption in real time.
[0092] As shown in Tables 1(a) and (b), this invention used 12 DNNs of different sizes and computational complexities from torchvision in the comparison experiments of prediction accuracy and energy efficiency. The image data used in the test inference process came from ImageNet. Each energy efficiency test required 50 runs to calculate the average result under different random inputs.
[0093] To demonstrate the effectiveness and superiority of the proposed adaptive DVFS framework (PowerLens), this invention is compared with three benchmark methods: ① Built-in Method (BiM), which selects two hardware platforms' built-in on-demand frequency adjustment methods that rely on historical hardware; ② FPG, a novel heuristic DVFS method referred to in this section as FPG-C+G, which dynamically adjusts CPU and GPU frequencies at runtime based on performance, power consumption, energy-delay product, and CPU / GPU utilization; ③ FPG-G, a variant of FPG-C+G, which retains the CPU's on-demand behavior and only adjusts the GPU's frequency strategy. To quantitatively evaluate energy efficiency, this invention uses the following formula to calculate energy efficiency:
[0094] Energy efficiency is a positive indicator; a higher value indicates better energy utilization performance. "images" represents the number of images processed by the model. "E" represents energy consumption, "P average" is the average power, and "t" is the inference time. "FPS" represents frames per second.
[0095] 1. Energy efficiency optimization effect
[0096] Tables 1(a) and (b) present the experimental results of energy efficiency improvement. Among them, Block... 1 The column represents the number of power blocks obtained through clustering in this deep neural network. BiM 2FPG-G 2 and FPG-C+G 2 The table shows the energy efficiency gains of the proposed solution relative to the benchmark method. Compared to BiM, the proposed solution achieves an average improvement of 57.85% on TX2 and an average improvement of 119.42% on AGX. Furthermore, compared to the FPG-G method, the proposed solution achieves an average energy efficiency improvement of 18.39% on TX2 and an average improvement of 27.31% on AGX. Compared to the FPG-C+G method, which configures both CPU and GPU frequencies simultaneously, the proposed solution, despite only configuring GPU frequency, achieves an average energy efficiency improvement of 13.53% on TX2 and an average improvement of 15.97% on AGX.
[0097] Table 1(a) Energy efficiency optimization data for TX2
[0098]
[0099] Table 1(b) Energy efficiency optimization data on AGX
[0100]
[0101] From the table above, we can also draw the following observations:
[0102] (1) Smaller networks, such as AlexNet and MobileNetV3, lack a sufficient number of operators for clustering. This limitation hinders the effectiveness of adaptive DVFS, thus limiting its energy efficiency improvement.
[0103] (2) Complex networks can be clustered into more power blocks, and the number of power blocks is positively correlated with energy efficiency improvement. For example, a comparison between ResNet34 and ResNet152, and between RegNet_X_32GF and RegNet_Y_128GF.
[0104] (3) For networks composed of repetitive components, the framework treats consecutively repeating components as a single block and makes decisions on the optimal frequency for them. This is achieved by considering the similarity of power consumption behavior in clustering. For example, the framework treats the connections of repeating transformer modules in the ViT model as a large power block.
[0105] 2. Testing in actual task flow
[0106] This invention randomly combines the DNNs listed in the table to form 100 inference tasks. Each task contains 50 three-channel 224×224 pixel images. For example... Figure 4As shown, in the experimental results of task flow processing, the solution of this invention exhibits the lowest energy consumption and highest energy efficiency among the four methods. Compared with FPG-G, FPG-CG, and BiM, the solution of this invention achieves energy reductions of 26.60%, 22.18%, and 48.58% on TX2, and 28.95%, 18.45%, and 50.64% on AGX, respectively. In terms of time, the solution of this invention increases task flow processing time by 6.13%, -0.54%, and 9.91% on TX2, and by 14.03%, -2.30%, and 16.82% on AGX, respectively. Furthermore, the solution of this invention achieves energy efficiency improvements of 36.24%, 28.49%, and 94.48% on TX2, and 40.75%, 22.62%, and 102.60% on AGX, respectively.
[0107] 3. Ablation test
[0108] To demonstrate the effectiveness of power consumption behavior similarity clustering in the algorithm, the scheme of this invention was compared with two variants using different clustering algorithms. Specifically, PR represents a method that replaces the clustering algorithm with random block partitioning, while PN represents a method that makes frequency decisions directly for the entire network without using any clustering algorithm. The energy efficiency losses of these two methods compared to the scheme of this invention are shown in Table 2. As can be seen from Table 2, the scheme using power consumption behavior similarity clustering outperforms the other methods.
[0109] Table 2 Energy efficiency loss of different clustering methods
[0110]
[0111] 4. Cost Analysis
[0112] The time overhead of the framework can be divided into offline stage overhead and runtime stage overhead. The offline overhead is shown in Table 3. Furthermore, to understand the runtime overhead, this invention changed the DVFS frequency 100 times on the device used in the experiment and measured its average time overhead to be 50 milliseconds. The time overhead caused by frequency reduction depends on the specific scenario.
[0113] Table 3 Frame Offline Overhead Analysis
[0114]
[0115] Example 1:
[0116] In the field of autonomous driving, energy efficiency optimization is crucial because autonomous vehicles rely on battery power and must process large amounts of complex data in real time. These vehicles use Dynamical Neural Networks (DNNs) to perform various tasks, such as object detection and decision-making, which place high demands on computing power and energy consumption. Therefore, improving energy efficiency not only helps extend battery life, ensuring that vehicles can operate independently for longer periods, but also reduces the thermal burden on computing systems, improving overall reliability and safety.
[0117] In this scenario, the framework's application leverages its adaptive DVFS mechanism to optimize different power consumption blocks within the DNN model, adaptively adjusting the frequency of the computing hardware. This approach enables the vehicle's computing system to dynamically optimize energy usage based on real-time demands during task execution, thereby reducing overall power consumption without sacrificing critical task performance. This optimization not only enhances the vehicle's range but also improves system efficiency, making autonomous vehicles more efficient and reliable during long-distance travel.
[0118] Example 2:
[0119] In scenarios involving large-scale model inference in cloud computing centers, the energy efficiency optimization framework proposed in this invention can significantly optimize energy efficiency and performance. Cloud computing centers typically need to process massive amounts of data and run complex deep learning models, such as large neural networks used for language translation, image recognition, or complex data analysis. These tasks have extremely high demands on computing resources, accompanied by huge energy consumption. In this environment, energy efficiency optimization becomes crucial. On the one hand, high energy consumption leads to huge operating costs; on the other hand, excessive energy consumption also increases heat dissipation requirements, potentially causing equipment overheating or even damage. Furthermore, for enterprises pursuing environmental sustainability, reducing energy consumption is also a key step in achieving this goal.
[0120] The framework's role here is to optimize the energy efficiency of deep learning models through an adaptive DVFS mechanism. Specifically, it analyzes different parts of the model, identifies the areas with the highest power consumption, and dynamically adjusts the allocation and frequency of computing resources for these areas. For example, some stages of model inference may require higher computing power to handle complex tasks, while in other stages, the frequency can be reduced to save energy. In this way, it ensures that overall energy consumption is minimized while maintaining inference performance.
[0121] In terms of practical results, cloud computing centers utilizing this framework can expect significantly reduced energy costs and higher energy efficiency. This not only alleviates dependence on electricity resources but also helps reduce environmental impact, and experimental results have shown that the framework performs better in optimizing larger and more complex models. Simultaneously, more efficient energy management also translates to better system stability and longer hardware lifespan, which is crucial for maintaining the long-term operation of cloud computing centers. Furthermore, by optimizing energy efficiency, cloud computing centers can provide more cost-effective services, enhancing their competitiveness in a highly competitive market.
[0122] Furthermore, embodiments of the present invention also provide an adaptive DVFS device for optimizing the energy efficiency of deep neural networks, comprising:
[0123] The clustering hyperparameter prediction module is used to extract global features of deep neural networks (DNNs) through a global feature extractor and input them into the clustering hyperparameter prediction model, so as to predict the clustering hyperparameters of the current DNNs through the clustering hyperparameter prediction model.
[0124] The target frequency decision module is used to perform power consumption behavior similarity clustering based on the predicted clustering hyperparameters and the fine-grained deep features of DNNs extracted by the deep feature extractor, so as to obtain the target decision frequency for each power consumption block.
[0125] The target frequency preset module is used to set DVFS instrumentation points before each power block of the DNNs, and to preset the target frequency in each power block within the DNNs according to the target decision frequency.
[0126] Specifically, the target frequency decision module is used for:
[0127] The power distance between the Mahalanobis distance mapping with introduced distance regularization term and the quantization operator is adopted;
[0128] Based on the power distance, the DNNs are divided into multiple power blocks using the neighborhood radius and minimum number of points in the cluster hyperparameter of the clustering algorithm to form a power view of the DNNs.
[0129] The global features of each power block are extracted by a global feature extractor and used as input to the decision model to divide the DNNs into multiple power blocks and form a power view. The global features of each power block are then input into the decision model to output the target decision frequency of each power block.
[0130] Furthermore, the aforementioned deep feature extractor is used to extract general and specific features of DNNs; the aforementioned global feature extractor is used to extract macroscopic structural features of DNNs and to perform feature statistics and aggregation.
[0131] In this embodiment, during the training process of the clustering hyperparameter prediction model, the structural features in the global features are used as the input in the initial stage of model training, and the statistical features in the global features are used as the input in the middle stage of model training.
[0132] The adaptive DVFS device for optimizing the energy efficiency of deep neural networks provided in this embodiment of the invention can execute the adaptive DVFS method for optimizing the energy efficiency of deep neural networks provided in any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the method execution, which will not be described in detail here.
[0133] Note that the above description is merely a preferred embodiment of the present invention and the technical principles employed. Those skilled in the art will understand that the present invention is not limited to the specific embodiments described herein, and various obvious changes, readjustments, and substitutions can be made without departing from the scope of protection of the present invention. Therefore, although the present invention has been described in detail through the above embodiments, the present invention is not limited to the above embodiments, and may include many other equivalent embodiments without departing from the concept of the present invention, the scope of which is determined by the scope of the appended claims.
Claims
1. An adaptive DVFS method for optimizing the energy efficiency of deep neural networks, characterized in that, include: S1. Extract global features of deep neural networks (DNNs) using a global feature extractor and input them into a clustering hyperparameter prediction model to predict the clustering hyperparameters of the current DNNs using the clustering hyperparameter prediction model. S2. Based on the predicted clustering hyperparameters and the fine-grained deep features of DNNs extracted by the deep feature extractor, power consumption behavior similarity clustering is performed to divide the DNNs into multiple power consumption blocks and form a power consumption view. Then, the global features of each power consumption block are input into the decision model to obtain the target decision frequency of each power consumption block. S3. Set DVFS instrumentation points before each power block of DNNs, and pre-set the target frequency in each power block of DNNs according to the target decision frequency; The deep feature extractor is used to extract general and specific features of DNNs; Common characteristics include computational complexity, number of parameters, and memory access complexity; Specific features include the kernel size, number, and stride of the convolutional layer, as well as the matrix parameters, number of attention heads, fully connected layer parameters, normalized parameters, and positional encoding method in the attention mechanism of the Transformer module. The global feature extractor is used to extract the macroscopic structural features of DNNs and to perform feature statistics and aggregation. Macroscopic structural features include the number of layers, depth, type, number of residual connections, and branching structure; The statistics and aggregation of features include calculating the total FLOPs and the total number of parameters of the network; In the training process of the clustering hyperparameter prediction model, the structural features in the global features are used as the input in the initial stage of model training, and the statistical features in the global features are used as the input in the middle stage of model training.
2. The method according to claim 1, characterized in that, S2 specifically includes: The power distance between the Mahalanobis distance mapping with introduced distance regularization term and the quantization operator is adopted; Based on the power distance, the DNNs are divided into multiple power blocks using the neighborhood radius and minimum number of points in the cluster hyperparameter of the clustering algorithm to form a power view of the DNNs. The global features of each power block are extracted by a global feature extractor and used as input to the decision model, so that the target decision frequency of each power block is output by the decision model.
3. An adaptive DVFS device for optimizing the energy efficiency of deep neural networks, characterized in that, include: The clustering hyperparameter prediction module is used to extract global features of deep neural networks (DNNs) through a global feature extractor and input them into the clustering hyperparameter prediction model, so as to predict the clustering hyperparameters of the current DNNs through the clustering hyperparameter prediction model. The target frequency decision module is used to perform power consumption behavior similarity clustering based on the predicted clustering hyperparameters and the fine-grained deep features of DNNs extracted by the deep feature extractor, so as to divide the DNNs into multiple power consumption blocks and form a power consumption view. Then, the global features of each power consumption block are input into the decision model to obtain the target decision frequency of each power consumption block. The target frequency preset module is used to set DVFS instrumentation points before each power block of DNNs, and to preset the target frequency in each power block within DNNs according to the target decision frequency; The deep feature extractor is used to extract general and specific features of DNNs; Common characteristics include computational complexity, number of parameters, and memory access complexity; Specific features include the kernel size, number, and stride of the convolutional layer, as well as the matrix parameters, number of attention heads, fully connected layer parameters, normalized parameters, and positional encoding method in the attention mechanism of the Transformer module. The global feature extractor is used to extract the macroscopic structural features of DNNs and to perform feature statistics and aggregation. Macroscopic structural features include the number of layers, depth, type, number of residual connections, and branching structure; The statistics and aggregation of features include calculating the total FLOPs and the total number of parameters of the network; In the training process of the clustering hyperparameter prediction model, the structural features in the global features are used as the input in the initial stage of model training, and the statistical features in the global features are used as the input in the middle stage of model training.
4. The apparatus according to claim 3, characterized in that, The target frequency decision module is specifically used for: The power distance between the Mahalanobis distance mapping with introduced distance regularization term and the quantization operator is adopted; Based on the power distance, the DNNs are divided into multiple power blocks using the neighborhood radius and minimum number of points in the cluster hyperparameter of the clustering algorithm to form a power view of the DNNs. The global features of each power block are extracted by a global feature extractor and used as input to the decision model, so that the target decision frequency of each power block is output by the decision model.