Design space exploration method, system, and computer program product for microarchitecture
By modeling the microarchitecture design configuration as a Markov decision process and using sequential decision generation to generate design points, the problem of missing parameter combination logic and local optima in existing methods is solved. This enables the generation of high-quality Pareto fronts within a limited simulation budget, improving the rationality and interpretability of the design configuration.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- INST OF COMPUTING TECH CHINESE ACAD OF SCI
- Filing Date
- 2026-02-12
- Publication Date
- 2026-06-23
AI Technical Summary
Existing microarchitecture design space exploration methods suffer from problems such as missing parameter combination logic, difficulty in credit allocation, susceptibility to local optima, and high simulation costs in processor design, making it difficult to generate high-quality and highly reasonable Pareto fronts within a limited simulation budget.
The design configuration generation process of microarchitecture parameters is modeled as a Markov decision process. Design points are generated by sequential decision-making. The design configuration is generated step by step through a state encoder, a coding layer network, a preference encoder, and a sequential decision action head module. The cascading dependencies and combinatorial design rules between parameters are captured to achieve an approximation of the Pareto front.
The model rapidly generates high-quality, highly rational Pareto fronts with very few simulation samples, improving the interpretability and decision-oriented nature of the model, avoiding structural contradictions and ineffective configurations, and enhancing the rationality and interpretability of the design configuration.
Smart Images

Figure CN122263584A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of computer hardware design technology, specifically to design space exploration methods, systems, and computer program products for microarchitecture. Background Technology
[0002] The statements in this section are merely to provide background information in relation to this application to aid in understanding it, and such background information does not necessarily constitute prior art.
[0003] In the design flow of processors and hardware systems, Design Space Exploration (DSE) is a crucial step in determining a product's core competitiveness. Its core objective is to search for a set of design configurations that achieve the optimal trade-offs among multiple competing objectives such as performance, power consumption, and area (PPA) within a multi-dimensional, high-dimensional parameter space. This provides the core basis for determining the final hardware solution. Essentially, this problem is a typical multi-objective optimization problem, and its solution relies on two core concepts: Pareto Optimality and Pareto Front. A Pareto optimal solution is a design configuration that is not inferior to other configurations in all optimization objectives, and is strictly superior to other configurations in at least one objective. This means that no other configuration can further improve the performance of any objective of this configuration without compromising the performance of at least one objective. The Pareto Front, on the other hand, is the surface or curve formed by all Pareto optimal solutions in the objective space. It represents all possible optimal trade-offs within the design space and serves as a core reference for designers in making solution decisions.
[0004] For processor microarchitecture design, Design Space Enumeration (DSE) faces extremely high design space complexity. This design space typically contains dozens of adjustable parameters, including clock frequency, issue width, cache size, branch predictor type, number of execution units, pipeline depth, etc. The total number of possible combinations of these parameters can reach hundreds of billions or even trillions, forming an extremely large high-dimensional search space. Performance evaluation of the Power-Performance Aspect (PPA) of each design configuration requires specialized simulators, and a single evaluation often takes several hours, making a full enumeration search of the entire design space completely infeasible in terms of both computational resources and time cost. Therefore, there is an urgent need for a design space exploration method that can rapidly generate high-quality, highly reasonable, and interpretable Pareto fronts within a limited simulation budget, providing more efficient technical support for processor microarchitecture design. Summary of the Invention
[0005] During their research on microarchitecture design space exploration, the inventors discovered that existing design space exploration methods have many inherent flaws in practical applications, which severely restrict the quality and efficiency of microarchitecture design.
[0006] After in-depth research and analysis, the inventors believe that the root cause of the aforementioned defects is the evaluation and optimization of the design configuration as an atomic whole. This approach ignores the fact that microarchitecture design is essentially a structured and sequential process, leading to problems such as difficulty in credit allocation, missing parameter dependencies, and susceptibility to local optima. This application aims to propose a solution for microarchitecture design space exploration based on sequential decision-making, modeling the entire design configuration generation process as a Markov decision process, and achieving rapid exploration of the Pareto front approximation of the microarchitecture design within a very small number of simulation samples.
[0007] The objective of this application is achieved through the following technical solution: According to a first aspect of this application, a method for exploring the design space of a microarchitecture is provided, comprising: inputting a set of microarchitecture parameters of a target hardware, the value range of the parameters, and multiple preset condition regions into a trained design point generation model, wherein the condition regions are preference regions partitioned from a predefined target space; using the trained design point generation model, generating multiple design points for each preset condition region based on the microarchitecture parameters and their value ranges using a sequential decision-making approach, wherein a design point includes all parameters and a set of values; merging all generated design points and performing Pareto dominance comparison to filter out non-dominated solution sets, thereby constructing a Pareto front approximation.
[0008] Preferably, the trained design point generation model uses a sequential decision-making approach to generate multiple design points for each preset condition region based on the microarchitecture parameters and their value ranges. This includes: selecting a parameter from the set of input microarchitecture parameters and assigning it a value according to its value range; using the currently assigned parameter and its value as the current state; selecting an unassigned parameter based on the current state and the preset condition region and assigning it a value according to its value range; updating the current state and repeating the subsequent steps until all input parameters have been assigned values, and using all parameters and their values as a single design point.
[0009] Preferably, the design point generation model includes: a state encoder module configured to obtain the current configuration embedding sequence based on the input parameters and their value range; an encoding layer network module configured to obtain the current configuration state representation based on the embedding sequence; a preference encoder module configured to obtain a preference vector based on the condition labels in the condition region; a gating fusion module configured to fuse the current configuration state representation and the preference vector to obtain a condition-aware state representation; and a sequence decision action head module configured to output the next configuration action or termination action based on the condition-aware state representation.
[0010] Preferably, the state encoder module is further configured to: generate parameter assignment embedding vectors based on the assigned parameters and their values; assign trainable vectors to unassigned parameters, the trainable vectors being used to identify their identity and unconfigured state; generate parameter type embedding vectors based on the type of the unassigned parameters; the parameter assignment embedding vectors, the trainable vectors, and the parameter type embedding vectors constitute the currently configured embedding sequence.
[0011] Preferably, the design point generation model is trained according to the following steps: Based on the set of microarchitecture parameters of the target hardware and the value range of the parameters, multiple first design points are generated using random sampling; the multiple first design points are simulated and evaluated to obtain corresponding index values, and multiple second design points are selected from the multiple first design points according to the index values; an initial Pareto front is constructed based on the multiple second design points; a reward band region is divided in the target space based on the initial Pareto front; a reward function is set according to the reward band region; the reward values of the multiple second design points are calculated according to the index values and the reward function, and the multiple second design points are converted into corresponding generation trajectories; Multiple condition regions are divided from the reward band region according to preset preferences, and the multiple second design points, their reward values, and generated trajectories are stored as sample points in the buffers of the corresponding condition regions according to the index values. The design point generation model is trained according to the set of microarchitecture parameters of the target hardware, the value range of the parameters, the condition regions and their corresponding buffers, and the parameters of the design point generation model are updated using trajectory balance loss according to the condition regions and the reward function until the model converges, thus obtaining the trained design point generation model. Among them, while updating the parameters of the design point generation model, the buffers of the condition regions are updated according to the new design points generated by the design point generation model.
[0012] Preferably, setting the reward function according to the reward band region includes: rewarding new design points that dominate or outperform the current Pareto front in the target space, and the greater the distance from the new design point to the current Pareto front in the target space, the higher the reward; dividing the reward band region into multiple grid cells, and determining the reward value for the new design point according to the number of historical design points in the grid cell, wherein the reward value is inversely proportional to the number of historical design points.
[0013] Preferably, the design point generation model is trained based on a set of microarchitecture parameters of the target hardware, the value range of the parameters, the conditional region and its corresponding buffer, and the parameters of the design point generation model are updated using trajectory balance loss based on the conditional region and the reward function. This includes: the design point generation model generating multiple third design points for the conditional region based on the set of microarchitecture parameters of the target hardware and the value range of the parameters; performing simulation evaluation on the multiple third design points to obtain corresponding index values; obtaining reward values for the multiple third design points based on the index values, the conditional region, and the reward function; constructing batch data based on the third design points and design points in the buffer of the conditional region according to a preset design point quantity ratio, wherein the proportion of design points from the buffer in the batch data decreases as the training process progresses; and updating the parameters of the design point generation model based on the reward values of the design points in the batch data and the generated trajectory to calculate trajectory balance loss.
[0014] Preferably, while updating the parameters of the design point generation model, updating the buffer of the condition region based on the new design point generated by the design point generation model includes: performing simulation evaluation on the new design point to obtain the corresponding index value; if the index value falls into the condition region corresponding to the design point, then calculating the reward value of the new design point based on the index value and the reward function; if the reward value of the new design point exceeds a preset reward threshold, then adding the new design point, its corresponding reward value, and the generation trajectory to the buffer of the condition region.
[0015] According to a second aspect of this application, a design space exploration system for microarchitecture is provided, comprising: an input module, a design point generation module, and an output module; wherein the input module is configured to input a set of microarchitecture parameters of the target hardware, the value range of the parameters, and multiple preset condition regions into a trained design point generation model, wherein the condition regions are preference regions divided from a predefined target space; the design point generation module is configured to use a sequential decision-making method to generate multiple design points for each preset condition region based on the microarchitecture parameters and their value ranges, wherein a design point includes all parameters and a set of values; the output module is configured to merge all generated design points and perform Pareto dominance comparison to filter out non-dominated solution sets, forming a Pareto front approximation.
[0016] According to a third aspect of this application, a computer-readable storage medium is provided, wherein the computer-readable storage medium stores a computer program that, when executed by a processor, implements the method of the first aspect of this application.
[0017] According to a fourth aspect of this application, a computer program product is provided, comprising a computer program that, when executed by a processor, implements the method of the first aspect of this application.
[0018] Compared with existing technologies, the advantages of this application's solution are mainly as follows: It models the microarchitecture parameter design and configuration generation process as a Markov decision process, using the partially configured design as the state and the selection and assignment of parameters to be configured as actions, transforming the traditional one-time global prediction into a step-by-step, progressive sequence generation mode. This modeling approach naturally captures the cascading dependencies and combinatorial design rules between microarchitecture parameters, ensuring the architectural rationality and parameter consistency of the output design configuration from the generation logic perspective. It effectively avoids structural contradictions and invalid configurations that cannot be implemented in engineering, providing a solid model foundation for the credit allocation problem in the optimization process. It can quickly generate high-quality, highly rational, and interpretable Pareto fronts within a limited simulation budget. By adopting a design point generation model as the core learning and execution framework for sequence decision-making, its inherent flow consistency constraints enable implicit credit allocation from the final multi-objective optimization reward to each step of parameter decision-making in sequence generation. This accurately identifies parameter selection behaviors that play a crucial role in the quality of configurations, significantly improving the model's interpretability and decision-oriented nature. Attached Figure Description
[0019] To more clearly illustrate the technical solutions in the embodiments of this application, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort. In the drawings: Figure 1 This is a flowchart illustrating a design space exploration method for microarchitecture according to an embodiment of this application; Figure 2 This is a schematic diagram of the design point generation model according to an embodiment of this application; Figure 3 This is a schematic diagram illustrating the construction of an initial Pareto front and the setting of a reward band region according to an embodiment of this application; Figure 4 This is a schematic diagram of a training design point generation model according to an embodiment of this application. Detailed Implementation
[0020] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided through specific embodiments. It should be understood that the specific embodiments described herein are for illustrative purposes only and are not intended to limit the scope of this application.
[0021] As mentioned in the background section, the core challenge of efficient Design for Evolution (DSE) technology is how to quickly approximate a realistic and complete Pareto front under limited simulation budget constraints. To address this challenge, the industry has developed several DSE methods, forming three main technical paths: The first is population-based evolutionary methods, represented by the NSGA-II algorithm. These methods maintain population diversity through genetics, mutation, and selection, enabling global exploration of the design space. However, these methods rely on a large number of random simulation samples, resulting in slow convergence. They typically require hundreds of thousands of evaluations to approach a relatively optimal Pareto front, leading to extremely high simulation costs and making them unsuitable for budget-constrained scenarios. The second is bottleneck analysis-based methods, which start from the initial design configuration and iteratively optimize by gradually identifying and eliminating performance bottlenecks. These methods are highly targeted and have high initial optimization efficiency, but they suffer from significant local optimum traps, easily becoming constrained by the initial configuration and trapped in a fixed optimization trajectory, making it difficult to escape local regions and discover a globally better solution. The third category is optimization methods based on surrogate models, with Bayesian optimization as a typical example. These methods predict the objective function using models such as Gaussian processes, and guide the simulation sampling direction based on the prediction results, significantly improving sample utilization efficiency and reducing simulation costs. Thanks to their excellent efficiency, this type of method has become the mainstream framework for current DSE (Discovery and Exploration) methods, and most efficient exploration methods are derived from and improved upon it.
[0022] However, existing optimization methods based on surrogate models and other mainstream DSE methods still have many shortcomings in practical applications of processor microarchitecture design, making it difficult to meet the design requirements of high precision, high efficiency, and high rationality. Specifically, these shortcomings are as follows: First, the parameter combination logic is lacking. Existing methods generally treat design configurations as direct input models of atomic units, lacking the ability to model the inherent dependencies between parameters and the rules for combination design. However, there are strong coupling relationships between parameters in processor microarchitecture. For example, increasing the number of execution units requires a simultaneous increase in issue width to realize performance advantages, and the matching of cache capacity and correlation directly affects memory access efficiency. Ignoring such dependencies and combination rules not only results in structurally unreasonable and unengineerable design configurations, but also makes it difficult to find the optimal synergistic combination of parameters, affecting the quality of the Pareto front.
[0023] Secondly, credit allocation is difficult. Existing methods can only provide an end-to-end mapping from design configuration to PPA performance, and cannot quantify the contribution of individual parameter decisions or parameter combinations to the final multi-objective optimization result. This leads to a lack of interpretability in the optimization process, making it impossible to clearly identify the core reasons for performance improvement or decline. It is also difficult to guide subsequent parameter adjustment and optimization decisions based on existing results, thus limiting the efficiency of iterative optimization.
[0024] Secondly, the sequential design process is not modeled. Hardware microarchitecture design is essentially a step-by-step decision-making process with clear causal dependencies. For example, the issue width is decided first, and then the number of execution units and cache access strategy are determined based on the issue width. Existing methods treat all parameters as parallel dimensions, which fails to reflect the causal relationship of this step-by-step decision-making. The generated design configuration may not conform to the actual design process, increasing the difficulty of implementing the solution.
[0025] Furthermore, existing methods are prone to getting trapped in local optima. Due to a lack of effective understanding of the global structure of the design space and a lack of targeted global exploration mechanisms, the search process often converges prematurely to a suboptimal region, making it difficult to discover globally better Pareto solutions distributed in other regions of the design space, resulting in insufficient completeness of the final Pareto front. At the same time, the high cost of high-quality front exploration is a prominent issue. When the quality requirement for the Pareto front is further increased from "good," the number of simulations required increases superlinearly, and the optimization cost rises sharply.
[0026] The inventors, in their in-depth research of the DSE method, recognized that the fundamental limitation of existing methods lies in their evaluation and optimization of design configurations as atomic wholes. This approach ignores the fact that microarchitecture design is essentially a structured, sequential decision-making process, with cascading dependencies and combinatorial rules among parameter choices (e.g., increasing the number of execution units constrains and relates to the decision on launch width). It is precisely because of this failure to model this inherent sequential characteristic that traditional methods encounter problems such as difficulties in credit allocation, missing parameter dependencies, and superlinear increases in simulation costs when approaching high-quality Pareto fronts.
[0027] Based on this, the inventors propose a solution that models the entire design configuration generation process as a Markov Decision Process (MDP). In the embodiments of this application, the process of generating a design configuration containing N parameters is defined as a trajectory. Among them, state This represents a partial configuration design, recording the parameters determined in the previous t steps and their values. Action Indicates the state Next, select a parameter from the remaining unconfigured parameters and assign it a valid value. Final state This indicates that all parameters have been configured, forming a complete design point. This modeling approach transforms one-time prediction into step-by-step generation, allowing each decision to be based on the parameters that have already been assigned values. In this way, the design configuration is restructured into a series of context-dependent sequential decision steps, ensuring that each choice is based on existing partial configurations and estimates their contribution to the target space. This allows for understanding the combined effects of parameters within a very small number of simulation samples, while simultaneously achieving efficient credit allocation.
[0028] Figure 1 This is a flowchart illustrating a design space exploration method for microarchitecture according to an embodiment of this application. Figure 1 As shown, the method includes: Step S101, inputting the set of microarchitecture parameters of the target hardware, the range of parameter values, and multiple preset condition regions into a trained design point generation model, wherein the condition region is a preferred region divided from a predefined target space (e.g., a target space composed of performance, power consumption, and area). Step S102, using the trained design point generation model, generating multiple design points for each preset condition region based on the microarchitecture parameters and their value ranges using a sequential decision-making approach, where each design point includes all parameters and a set of values. Step S103, merging all generated design points and performing Pareto dominance comparison to filter out the non-dominated solution set, forming a Pareto front approximation.
[0029] This application's embodiments model the microarchitecture parameter design and configuration generation process as a Markov decision process. It uses the partially configured design as the state and the selection and assignment of parameters to be configured as actions, transforming the traditional one-time global prediction into a step-by-step, progressive sequence generation mode. This modeling approach naturally captures the cascading dependencies and combinatorial design rules between microarchitecture parameters, ensuring the architectural rationality and parameter consistency of the output design configuration from a logical perspective. It effectively avoids structural contradictions and ineffective configurations that cannot be implemented in engineering, while providing a solid model foundation for the credit allocation problem in the optimization process. Employing a design point generation model as the core learning and execution framework for sequence decision-making, its inherent flow consistency constraint enables implicit credit allocation from the final multi-objective optimization reward to each step of parameter decision-making in sequence generation. This accurately identifies parameter selection behaviors that play a crucial role in determining configuration quality, significantly improving the model's interpretability and decision-oriented nature.
[0030] More specifically, in step S201, the set of microarchitectural parameters of the target hardware (e.g., CPU core) and their value ranges (e.g., fixed-point arithmetic units: 1~4, clock frequency: 1GHz~2.5GHz, etc.) as well as the preference region (i.e. the preset condition region) divided from the predefined target space are input into the trained design point generation model.
[0031] The design point generation model can employ Generative Flow Networks (GFlowNet). As mentioned above, in the embodiments of this application, the generation process of the complete design configuration is modeled as a Markov decision process, and the trained design point generation model serves as the learning and execution framework for this Markov decision process, ultimately outputting the optimal design configuration that conforms to the objective programming. This objective programming can be represented in the form of a multi-objective optimization function, typically maximizing performance (CPI), minimizing power consumption (Power), and minimizing area (Area). The process of generating a design configuration containing N parameters is defined as a trajectory. Among them, state This represents a partial configuration design, recording the parameters determined in the previous t steps and their values. Action Indicates the state Next, select a parameter from the remaining unconfigured parameters and assign it a valid value. Final state This indicates that all parameters have been configured, forming a complete design point. The trained design point generation model then progressively generates decisions for each step, ensuring that each decision is based on the parameters that have already been assigned values.
[0032] In some embodiments, the design point generation model mainly comprises three key components: a state encoder, an encoding layer network, and a sequence decision action head. The state encoder is used to learn representations of the input partial design configurations. For example, it generates specific embedding vectors for assigned parameters and their specific values; provides learnable identifier embeddings for unconfigured parameters; and introduces parameter category information (such as computational, storage, etc.) to distinguish architectural parameters of different dimensions. The encoding layer network can be composed of stacked deep neural network modules, used to perform context-aware fusion and enhancement of the embedding sequence output by the state encoder. This encoding layer network can model high-order, non-linear dependencies between parameters in the partial configuration state, generating a high-dimensional state representation rich in global semantic information. Typical implementations include, but are not limited to, Transformer structures based on self-attention mechanisms. The sequence decision action head is configured to generate two types of action logits based on the high-dimensional state representation output by the encoding layer network. One is to score all feasible "parameter-value" combination actions to guide the next configuration selection; the other is to predict the "termination configuration" action to determine whether the current trajectory should end. The specific structure and training process of the design point generation model will be discussed in detail below.
[0033] Continue to refer to Figure 1 In step S102, the trained design point generation model uses a sequential decision-making approach to generate multiple design points for each preset condition region based on the microarchitecture parameters and their value ranges. Each design point includes all parameters and a set of values. The generation process may include: selecting a parameter from the set of input microarchitecture parameters and assigning it a value according to its value range; using the currently assigned parameter and its value as the current state; based on the current state and the preset condition region, selecting an unassigned parameter and assigning it a value according to its value range; updating the current state and repeating the subsequent steps until all input parameters have been assigned values, and all parameters and their values are considered as one design point.
[0034] In step S103, all generated design points are merged, and Pareto dominance relationships are compared to filter out non-dominated solutions, forming the Pareto front approximation. The non-dominated solution set obtained through Pareto dominance relationship comparison can be accomplished using the Fast Non-dominated Sorting algorithm. First, all generated design points are traversed, and the performance, power consumption, area, and other multi-objective optimization metrics corresponding to each design point are obtained using a simulator. Based on these metrics, pairwise dominance relationships are determined for any two design points, identifying a dominance relationship where one design point is not inferior to the other in all optimization objectives and is strictly superior in at least one optimization objective. After determining the dominance relationships of all design points, solutions not dominated by any other design point are retained, while redundant dominated solutions are eliminated. The remaining non-dominated solutions constitute the Pareto front approximation of the target hardware. Since this algorithm is a known prior art, it will not be elaborated upon here.
[0035] To better understand the solution of this application, examples will be given below regarding the model structure, training method, and training data of the design point generation model.
[0036] I. Model Structure Figure 2 This is a schematic diagram of the design point generation model according to an embodiment of this application. The model employs a forward policy network, such as... Figure 2 As shown, its structure includes: a state encoder module 201, an encoding layer network module 202, a preference encoder module 203, a gating fusion module 204, and a sequence decision action head module 205. The state encoder module 201 is configured to obtain the currently configured embedding sequence based on the input parameters and their value ranges. In some embodiments, the state encoder module 201 generates parameter assignment embedding vectors based on the assigned parameters and their values; then, it assigns trainable vectors to unassigned parameters, which are used to identify their identity and unconfigured state; it generates parameter type embedding vectors based on the types of the unassigned parameters; finally, it combines the parameter assignment embedding vectors, the trainable vectors, and the parameter type embedding vectors to form the currently configured embedding sequence. The encoding layer network module 202 is configured to obtain the currently configured state representation based on the embedding sequence. The preference encoder module 203 is configured to obtain a preference vector based on the condition labels in the condition region. The gating fusion module 204 is configured to fuse the currently configured state representation and the preference vector to obtain a condition-aware state representation. The sequence decision action head module 205 is configured to output the next configuration action or termination action based on the condition-aware state representation.
[0037] II. Training Methods To enable the constructed design point generation model to efficiently and accurately generate processor microarchitecture design configurations covering the complete Pareto front, this application embodiment adopts a global contour recognition-local region refinement training method. The specific training process is as follows: Step S1-1: Generate multiple first design points based on the set of microarchitecture parameters of the target hardware and the range of parameter values; construct an initial Pareto front based on the multiple first design points; and set a reward function based on the initial Pareto front.
[0038] like Figure 3 As shown, based on the set of microarchitecture parameters of the target hardware and the range of parameter values, multiple first design points X are generated using random sampling. The performance, power consumption, and area of the first design points X are simulated and evaluated using a PPA simulator to obtain corresponding index values. Based on these index values, multiple second design points are selected from the first design points X, and an initial Pareto front (curve) -X- is constructed based on these second design points. Although this front has limited accuracy, it can accurately reflect the optimal target trade-off trend in the design space. Using the constructed initial Pareto front as a benchmark, a reward band region ( ) is delineated around this front in the target space (such as a two-dimensional performance-power consumption space or a three-dimensional performance-power consumption-area space). Figure 3 The red zone (in the middle) only awards prizes to design configurations that fall within or exceed this zone, and the reward function is set according to the following reward strategy based on the reward zone: The reward is for new design points that dominate or outperform the current Pareto front in the target space, with higher rewards for new design points located further from the current Pareto front. The distance to the new design point is positively correlated with the reward value, thus incentivizing the generator to continuously explore the outer regions of the front and push the Pareto front outwards.
[0039] The reward zone is divided into multiple grid cells, and the reward value for a new design point is determined based on the number of historical design points within each grid cell. In some embodiments, the reward value is inversely proportional to the number of historical design points. A preferred embodiment is that the reward value can be the reciprocal of the square of the number of historical design points, i.e., the reward value is inversely proportional to the square of the number of historical design points. Additional rewards are given to sparsely distributed areas, while the reward weight is reduced for densely distributed areas. This prevents the generator from oversampling in locally dense areas, guides it to explore blank areas, and ensures global coverage of the design space.
[0040] In this embodiment, with extremely low simulation costs, the design point generation model can gain a preliminary understanding of the global shape of the Pareto front, clarify the optimization direction and exploration range of the design space, and quickly grasp the core logic of moving towards the front and exploring the sparse region, providing directional guidance for subsequent training.
[0041] Step S1-2: Calculate the reward values of multiple second design points based on the indicator values and reward functions of the design points, and convert the multiple second design points into corresponding generated trajectories.
[0042] Steps S1-3, as follows Figure 4 As shown, based on preset preferences, the reward band region is divided into multiple conditional regions. Taking the performance-power consumption two-dimensional space as an example, it can be divided into upper (low performance, high power consumption), upper-middle, middle (balanced zone), lower-middle, and lower (high performance, low power consumption). The second design point (i.e., ...) is determined according to the design point's index value. Figure 4 The initial design points (as shown) and their reward values, along with the generated trajectories, are stored as sample points in the corresponding conditional region buffers. Specifically, each conditional region maintains an independent high-reward design point buffer to store design points that fall within that region and receive high rewards, along with their reward values and generated trajectories (including the input state, parameter decision actions, and output results of the forward policy network), forming a high-quality sample experience base (high-reward design point buffer). In some embodiments, high-reward design points are selected in descending order of reward value, or are design points whose reward value exceeds a preset threshold.
[0043] Steps S1-4: Train the design point generation model based on the set of microarchitecture parameters of the target hardware, the range of parameter values, the conditional region, and its corresponding buffer. Update the parameters of the design point generation model using trajectory balancing loss based on the conditional region and reward function until the model converges, obtaining the trained design point generation model. During the update of the design point generation model parameters, the buffer of the conditional region is updated based on the new design points generated by the model.
[0044] Due to the conflicting parameter preferences in different design point regions (e.g., high-performance regions require wide emitter width and large cache, while low-power regions require narrow emitter width and small cache), in addition to considering the set of microarchitectural parameters and their value ranges of the target hardware, a conditional generation mechanism needs to be introduced. This mechanism inputs a specified conditional region into the design point generation model, awarding a reward only when the generated design point falls within that region. This addresses the problem of the design point generation model's inability to cover the extreme regions of the Pareto front. Since generative models lack traditional training data, this application's embodiments efficiently utilize discovered high-quality samples during training to accelerate the convergence of the design point generation model in potential optimal regions and improve the quality and stability of generated configurations, thus achieving a transition from global exploration to local refinement.
[0045] Continue to refer to Figure 4 In some embodiments, steps S1-4 may specifically include: Step S1-4-1: The design point generation model (forward policy network) generates multiple third design points (i.e., new design points) for the conditional regions based on the set of microarchitecture parameters of the target hardware and the range of parameter values. In some embodiments, to reduce the amount of training data, coarse-grained fusion processing can also be performed on the conditional regions, including: merging the fine-grained vertical conditional regions divided in step S1-3 into several coarse-grained conditional regions, clarifying the optimization preference orientation of each conditional region. Taking the division of five conditional regions in step S1-3 as an example, they can be merged into three types of conditional regions: conditional region A (upper + upper-middle, low power preference), conditional region B (upper-middle + middle, balanced preference), and conditional region C (lower-middle + lower, high performance preference), which ensures both the clarity of the conditional orientation and the preservation of exploration space within the region, ensuring the continuity of the frontier.
[0046] Step S1-4-2: Simulate and evaluate multiple third design points (using a simulator) to obtain corresponding index values. Based on the index values, conditional regions, and reward functions (using a reward function), obtain the reward values for multiple third design points respectively.
[0047] Step S1-4-3: Construct batch data (design point replayer) based on the preset design point ratio and the design points in the buffer of the third design point and the condition region. As training progresses, the proportion of design points from the buffer in the batch data decreases. In the early stages of training, a higher proportion of the trajectory replays from the high-quality sample experience base are included in the batch data (e.g., 80% from the buffer and 20% from the third design point). This allows the design point generation model to quickly learn the parameter combination logic and generation path of high-quality designs, rapidly guiding the generation strategy towards the high-reward intervals of each condition region and shortening the convergence period. As training iterates and the buffer is updated, the proportion of replay trajectories from the high-quality sample experience base is gradually reduced, gradually releasing the autonomous exploration capability of the design point generation model and encouraging it to explore new high-reward design points based on high-quality regions, avoiding getting trapped in local optima.
[0048] Step S1-4-4: Based on the reward value of the design point in the batch data and the generated trajectory, calculate the trajectory balance loss and update the parameters of the design point generation model.
[0049] In some embodiments, steps S1-4, which involve updating the buffer of the condition region based on the newly generated design point from the design point generation model while updating the parameters of the design point generation model, include: performing simulation evaluation on the new design point to obtain the corresponding index value. If the index value falls into the condition region corresponding to the design point, the reward value of the new design point is calculated based on the index value and the reward function; that is, a reward is given to the new design point only if it falls into the specified condition region. If the reward value of the new design point exceeds a preset reward threshold, the new design point, its corresponding reward value, and the generated trajectory are added to the buffer of the condition region.
[0050] Through the above training, the design point generation model can gradually form a complete understanding of the Pareto front, and can not only stably generate high-quality design configurations in various regions, but also cover extreme boundary regions such as high performance and low power consumption. Ultimately, it achieves the goal of efficiently generating a complete and high-quality set of Pareto front design configurations with a limited simulation budget.
[0051] III. Training Data In the training process of this application embodiment, batch data needs to be constructed. The batch data is constructed according to a set ratio based on the new design points generated by the design point generation model during training and the design points stored in the buffer of the specified condition region, including the design points and their reward values, and the generated trajectory. As the training process progresses, the proportion of design points from the buffer in the batch data becomes smaller and smaller.
[0052] The design points stored in the initial buffer are selected from randomly generated first design points. The specific process is as follows: Simulation evaluation is performed on the first design points to obtain corresponding index values. Based on these index values, multiple second design points are selected from the first design points, and an initial Pareto front is constructed based on these second design points. Using the constructed initial Pareto front as a benchmark, a reward band region is delineated in the target space. Multiple condition regions are then divided from the reward band region. According to the index values of the design points, the second design points, their reward values, and the generated trajectories are stored as sample points in the buffers of the corresponding condition regions.
[0053] Subsequently, while updating the parameters of the design point generation model, the buffer of the condition region is updated based on the new design points generated by the model. Specifically, the process involves: performing simulation evaluation on the new design points to obtain their corresponding index values. If the index value falls within the condition region corresponding to the design point, the reward value of the new design point is calculated based on the index value and the reward function. If the reward value of the new design point exceeds a preset reward threshold, the new design point, its corresponding reward value, and the generated trajectory are added to the buffer of the condition region.
[0054] According to another embodiment of this application, a design space exploration system for microarchitecture is provided, comprising: an input module, a design point generation module, and an output module. The input module is configured to input a set of microarchitecture parameters of the target hardware, the parameter value ranges, and multiple preset condition regions into a trained design point generation model, wherein the condition regions are preference regions partitioned from a predefined target space. The design point generation module is configured to use a sequential decision-making approach to generate multiple design points for each preset condition region based on the microarchitecture parameters and their value ranges, wherein each design point includes all parameters and a set of values. The output module is configured to merge all generated design points and perform Pareto dominance comparisons to filter out non-dominated solution sets, forming a Pareto front approximation.
[0055] This application's embodiments model the microarchitecture parameter design and configuration generation process as a Markov decision process. It uses the partially configured design as the state and the selection and assignment of parameters to be configured as actions, transforming the traditional one-time global prediction into a step-by-step, progressive sequence generation mode. This modeling approach naturally captures the cascading dependencies and combinatorial design rules between microarchitecture parameters, ensuring the architectural rationality and parameter consistency of the output design configuration from a logical perspective. It effectively avoids structural contradictions and ineffective configurations that cannot be implemented in engineering, while providing a solid model foundation for the credit allocation problem in the subsequent optimization process. Employing a design point generation model as the core learning and execution framework for sequence decision-making, its inherent flow consistency constraint enables implicit credit allocation from the final multi-objective optimization reward to each step of parameter decision-making in sequence generation. This accurately identifies parameter selection behaviors that play a crucial role in determining configuration quality, significantly improving the model's interpretability and decision-oriented nature.
[0056] Those skilled in the art will understand that, for the sake of convenience and brevity, the specific working process of the system and modules described above can be referred to the corresponding process in the foregoing method embodiments, and will not be repeated here.
[0057] The various embodiments in this application are described in a progressive manner, with each embodiment focusing on the differences from other embodiments or implementation methods. Similar or identical parts between the various embodiments of this application can be referred to mutually. The implementation principles and technical effects of the inventive concept can be mutually referenced, and will not be repeated here. Where there is no conflict, the various embodiments or implementation methods in this application can be combined with each other.
[0058] It should be noted that although the steps are described in a specific order above, it does not mean that the steps must be executed in the above specific order. In fact, some of these steps can be executed concurrently, or even in a different order, as long as the required function can be achieved.
[0059] This application may be a system, method, and / or computer program product. A computer program product may include a computer-readable storage medium having computer-readable program instructions loaded thereon for causing a processor to implement various aspects of this application.
[0060] Computer-readable storage media can be tangible devices that hold and store instructions for use by an instruction execution device. Computer-readable storage media can include, for example, but not limited to, electrical storage devices, magnetic storage devices, optical storage devices, electromagnetic storage devices, semiconductor storage devices, or any suitable combination thereof. More specific examples (a non-exhaustive list) of computer-readable storage media include: portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static random access memory (SRAM), portable compact disc read-only memory (CD-ROM), digital multifunction disc (DVD), memory sticks, floppy disks, mechanical encoding devices, such as punch cards or recessed protrusions storing instructions thereon, and any suitable combination thereof.
[0061] This application uses specific embodiments to illustrate the principles and implementation methods of this application. The description of the above embodiments is only for the purpose of helping to understand the solution and core ideas of this application. At the same time, for those skilled in the art, there will be changes in the specific implementation methods and application scope based on the ideas of this application. Therefore, the content of this specification should not be construed as a limitation of this application.
Claims
1. A design space exploration method for micro-architecture, wherein, include: The set of microarchitectural parameters of the target hardware, the range of values of the parameters, and multiple preset condition regions are input into the trained design point generation model, wherein the condition regions are preference regions divided from the predefined target space. Using the trained design point generation model, a sequential decision-making approach is adopted to generate multiple design points for each preset condition region based on the microarchitecture parameters and their value ranges. Each design point includes all parameters and a set of values. All generated design points are merged, and Pareto dominance is compared to select the non-dominated solution set, thus forming the Pareto front approximation.
2. The method of claim 1, wherein, Using the trained design point generation model, a sequential decision-making approach is employed to generate multiple design points for each preset condition region based on microarchitecture parameters and their value ranges, including: Select a parameter from the set of input microarchitecture parameters and assign it a value according to its range; Use the currently assigned parameters and their values as the current state; Based on the current state and the preset condition range, select an unassigned parameter and assign it a value according to its value range; Update the current state and repeat the subsequent steps until all input parameters have been assigned values, treating all parameters and their values as a design point.
3. The method according to claim 1, wherein, The design point generation model includes: The state encoder module is configured to obtain the currently configured embedding sequence based on the input parameters and their value range; The coding layer network module is configured to obtain a representation of the current configuration state based on the embedding sequence; The preference encoder module is configured to obtain a preference vector based on the condition labels in the condition region; The gating fusion module is configured to fuse the currently configured state representation and the preference vector to obtain a condition-aware state representation; The sequence decision action head module is configured to output the next configuration action or termination action based on the condition-aware state representation.
4. The method according to claim 3, wherein, The state encoder module is also configured as follows: Generate parameter assignment embedding vectors based on the assigned parameters and their values; Assign trainable vectors to unassigned parameters, the trainable vectors being used to identify their identity and unconfigured state; Generate a parameter type embedding vector based on the type of the unassigned parameter; The parameter assignment embedding vector, the trainable vector, and the parameter type embedding vector constitute the embedding sequence of the current configuration.
5. The method according to claim 1, wherein, The design point generation model is trained according to the following steps: Based on the set of microarchitecture parameters of the target hardware and the range of values for the parameters, multiple first design points are generated by random sampling. Simulation evaluation is performed on the multiple first design points to obtain corresponding index values, and multiple second design points are selected from the multiple first design points according to the index values. An initial Pareto front is constructed based on the multiple second design points. A reward band region is divided in the target space based on the initial Pareto front. A reward function is set according to the reward band region. Calculate the reward value of the plurality of second design points according to the indicator value and the reward function, and convert the plurality of second design points into corresponding generated trajectories; According to the preset preferences, multiple condition regions are divided from the reward band area, and the multiple second design points, their reward values, and the generated trajectories are stored as sample points in the buffer of the corresponding condition region according to the index value. The design point generation model is trained based on the set of microarchitecture parameters of the target hardware, the range of values of the parameters, the condition region and its corresponding buffer, and the parameters of the design point generation model are updated using trajectory balance loss based on the condition region and the reward function until the model converges, thus obtaining the trained design point generation model. Specifically, while updating the parameters of the design point generation model, the buffer of the condition region is updated based on the new design points generated by the design point generation model.
6. The method according to claim 5, wherein, Setting the reward function based on the reward band region includes: The reward is for new design points that dominate or outperform the current Pareto front in the target space, and the greater the distance between the new design point and the current Pareto front in the target space, the higher the reward. The reward zone is divided into multiple grid cells, and the reward value for a new design point is determined based on the number of historical design points within the grid cell. The reward value is inversely proportional to the number of historical design points.
7. The method according to claim 5, wherein, The design point generation model is trained based on the set of microarchitecture parameters of the target hardware, the value range of the parameters, the conditional region and its corresponding buffer, and the parameters of the design point generation model are updated using trajectory balancing loss based on the conditional region and the reward function, including: The design point generation model generates multiple third design points for the condition region based on the set of microarchitecture parameters of the target hardware and the value range of the parameters. Simulation evaluation is performed on the plurality of third design points to obtain corresponding index values, and reward values for the plurality of third design points are obtained based on the index values, the condition region, and the reward function. According to the preset design point ratio, batch data is constructed based on the third design point and the design points in the buffer of the condition region, wherein the proportion of design points from the buffer in the batch data decreases as the training process progresses. The parameters of the design point generation model are updated based on the reward value of the design point in the batch data and the trajectory balance loss calculated from the generated trajectory.
8. The method according to claim 5, wherein, While updating the parameters of the design point generation model, the buffer of the condition region is updated based on the new design points generated by the design point generation model, including: Simulation evaluation of the new design points yields the corresponding index values; If the index value falls within the condition region corresponding to the design point, then the reward value of the new design point is calculated based on the index value and the reward function; If the reward value of the new design point exceeds the preset reward threshold, the new design point, its corresponding reward value, and the generated trajectory are added to the buffer of the condition region.
9. A design space exploration system for microarchitecture, comprising: The module consists of an input module, a design point generation module, and an output module; among which, The input module is configured to input a set of microarchitectural parameters of the target hardware, the range of values of the parameters, and multiple preset condition regions into a trained design point generation model, wherein the condition regions are preference regions divided from a predefined target space. The design point generation module is configured to use a sequential decision-making approach to generate multiple design points for each preset condition region based on the microarchitecture parameters and their value ranges, wherein a design point includes all parameters and a set of values. The output module is configured to merge all generated design points and perform Pareto dominance comparisons to filter out non-dominated solution sets, thus forming a Pareto front approximation.
10. A computer program product comprising a computer program that, when executed by a processor, implements the method as described in any one of claims 1-8.