Chip architecture, chip architecture parameter determination method and related device

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By combining process data and machine learning proxy models, the topology of the storage unit module and the size parameters of the computing array are automatically optimized, solving the problem of inaccurate parameter determination in NPU chip architecture design and achieving precise energy management and energy efficiency optimization.

CN122242429APending Publication Date: 2026-06-19VIVO MOBILE COMM CO LTD

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: VIVO MOBILE COMM CO LTD
Filing Date: 2026-03-03
Publication Date: 2026-06-19

AI Technical Summary

Technical Problem

In NPU chip architecture design, existing technologies lack physical sensing capabilities, resulting in inaccurate determination of architecture parameters, redundancy, or energy efficiency bottlenecks, and an inability to effectively reduce chip power consumption.

Method used

By using process data to determine the topology parameters of the memory cell module and using a machine learning surrogate model to predict the power consumption of the computing array, the chip architecture parameters can be automatically optimized, and energy bottlenecks on the data transport path can be accurately located and eliminated.

Benefits of technology

It improves the accuracy and efficiency of determining chip architecture parameters, reduces chip power consumption, and ensures the physical feasibility and energy efficiency optimization of the design.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122242429A_ABST

Patent Text Reader

Abstract

This application discloses a chip architecture, a method for determining chip architecture parameters, and related apparatus, belonging to the field of electronic design technology. The chip architecture includes: a system bus interface; a data access controller; a storage subsystem comprising multiple storage unit modules, the topology parameters of which are determined based on the power consumption of storage unit modules with different topologies, the topology parameters including at least one of the number, capacity, and physical splicing topology of storage unit modules, the power consumption of which is determined based on process data; a computing engine including a computing array, the size parameters of which are determined based on the power consumption of computing arrays with different configurations, the different configurations of which are computing arrays with at least one different size parameter, computing accuracy, and flip-flop rate, the power consumption of which is predicted by a machine learning proxy model; a global control unit; and a vector processing unit.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application belongs to the field of electronic design technology, specifically relating to a chip architecture, a method for determining chip architecture parameters, and related devices. Background Technology

[0002] In the current smartphone and mobile terminal chip R&D industry, with the explosion of edge AI (Artificial Intelligence) applications such as computational photography, real-time voice translation, and edge large models, NPU (Neural Processing Unit) chips have become core modules with extremely high area and power consumption in SOC (System on a Chip).

[0003] Unlike server-side chips, mobile phone chips are subject to strict limitations in battery capacity and heat dissipation. Therefore, in the early stages of defining the NPU chip architecture, designers not only need to evaluate power consumption figures, but also determine key hardware architecture parameters, including the physical partitioning strategy of the on-chip storage subsystem, the geometry of the computing array, and the bit width precision of the data path. The selection of these architecture parameters directly determines the final PPA (Power Performance and Area) performance of the NPU chip.

[0004] However, when determining the architecture parameters of NPU chips, chip design manufacturers currently rely mainly on commercial EDA (Electronic Design Automation) tools or internal scripts for auxiliary evaluation. The determination of architecture parameters lacks physical perception capabilities and cannot determine the optimal hardware specifications. This can lead to redundancy or energy efficiency bottlenecks in the final NPU hardware architecture. Summary of the Invention

[0005] The purpose of this application is to provide a chip architecture, a method for determining chip architecture parameters, and related apparatus, which can realize automatic optimization of chip architecture parameters, improve the accuracy and efficiency of chip architecture parameter determination, accurately locate and eliminate energy consumption bottlenecks on the data transport path, and reduce chip power consumption.

[0006] In a first aspect, embodiments of this application provide a chip architecture, including: a system bus interface; a data access controller connected to the system bus interface; a storage subsystem connected to the data access controller, the storage subsystem including multiple storage unit modules; the topology parameters of the multiple storage unit modules are determined based on the energy consumption of storage unit modules with different topologies, the topology parameters including at least one of the number, capacity, and physical splicing topology of storage unit modules, the energy consumption of storage unit modules being determined based on process data; a computing engine connected to the storage subsystem, the computing engine including a computing array; the size parameters of the computing array are determined based on the power consumption of computing arrays with different configurations, the computing arrays with different configurations being computing arrays with at least one different size parameter, computing accuracy, and flip-flop rate, the power consumption of the computing array being predicted by a machine learning proxy model; a global control unit connected to both the system bus interface and the storage subsystem; and a vector processing unit connected to both the computing engine and the storage subsystem.

[0007] Secondly, embodiments of this application provide a method for determining chip architecture parameters, including: obtaining boundary constraint information between design goals and the architecture search space, the boundary constraint information including process data and definition information of the architecture search space, the definition information being used to constrain the adjustment range of chip architecture parameters, the chip architecture parameters including: topology parameters of the storage cell modules of the storage subsystem in the chip and size parameters of the computing array in the chip; determining the energy consumption of storage cell modules with different topologies within the adjustment range based on the process data, and determining the topology parameters of the storage cell modules of the storage subsystem in the chip based on the energy consumption of storage cell modules with different topologies, the topology parameters including at least one of the number, capacity, and physical splicing topology of storage cell modules; predicting the power consumption of computing arrays with different configurations within the adjustment range using a machine learning surrogate model, and determining the size parameters of the computing arrays in the chip based on the power consumption of computing arrays with different configurations, wherein the computing arrays with different configurations are computing arrays with at least one different in size parameters, computing accuracy, and toggle rate.

[0008] Thirdly, embodiments of this application provide a device for determining chip architecture parameters, comprising: an acquisition unit, configured to acquire boundary constraint information between the design target and the architecture search space, the boundary constraint information including process data and definition information of the architecture search space, the definition information being used to constrain the adjustment range of the chip architecture parameters, the chip architecture parameters including: topology parameters of the storage cell modules of the storage subsystem in the chip and size parameters of the computing array in the chip; a processing unit, configured to determine the energy consumption of storage cell modules with different topologies within the adjustment range based on the process data, and to determine the topology parameters of the storage cell modules of the storage subsystem in the chip based on the energy consumption of the storage cell modules with different topologies, the topology parameters including the number, capacity, and physical splicing topology of the storage cell modules; the processing unit is further configured to predict the power consumption of computing arrays with different configurations within the adjustment range using a machine learning surrogate model, and to determine the size parameters of the computing arrays in the chip based on the power consumption of the computing arrays with different configurations, the computing arrays with different configurations being computing arrays with at least one different in size parameters, computational accuracy, and toggle rate.

[0009] Fourthly, embodiments of this application provide an electronic device, including: a chip architecture as described in the first aspect.

[0010] Fifthly, embodiments of this application provide another electronic device, including a processor and a memory, wherein the memory stores a program or instructions that can run on the processor, and when the program or instructions are executed by the processor, they implement the steps of the method for determining chip architecture parameters as described in the second aspect.

[0011] In a sixth aspect, embodiments of this application provide a readable storage medium storing a program or instructions, which, when executed by a processor, implement the steps of the chip architecture parameter determination method as described in the second aspect.

[0012] In a seventh aspect, embodiments of this application provide another chip, including a processor and a communication interface, wherein the communication interface and the processor are coupled, and the processor is used to run programs or instructions to implement the steps of the chip architecture parameter determination method as described in the second aspect.

[0013] Eighthly, embodiments of this application provide a computer program product, which is stored in a storage medium and executed by at least one processor to implement the steps of the chip architecture parameter determination method as described in the second aspect.

[0014] The chip architecture provided in this application includes a system bus interface, a data access controller, a storage subsystem, a computing engine, a global control unit, and a vector processing unit. The storage subsystem includes multiple storage unit modules, and the computing engine includes a computing array. The data access controller is connected to the system bus interface, the storage subsystem is connected to the data access controller, the computing engine is connected to the storage subsystem, the global control unit is connected to both the system bus interface and the storage subsystem, and the vector processing unit is connected to both the computing engine and the storage subsystem. Specifically, the determination of the topology parameters of the storage unit modules and the size parameters of the computing array are independent of each other for chip architecture parameters. Specifically, based on the energy consumption of storage unit modules with different topologies determined by process data, the topology parameters of the storage unit modules are determined from multiple levels, including the number, capacity, and physical splicing topology of the storage unit modules. This achieves refined evaluation and feedback of the topology parameters of the storage unit modules, improves the accuracy of topology parameter determination, and reduces chip energy consumption. Furthermore, based on the power consumption of computing arrays with at least one different configuration among size parameters, computing accuracy, and toggle rate predicted by a machine learning surrogate model, the size parameters of the computing array are determined, improving the accuracy of size parameter determination and facilitating chip power consumption reduction. In this way, by automatically optimizing the chip architecture parameters at both the topology parameters of the storage unit module and the size parameters of the computing array during the chip design process, the accuracy of determining the chip architecture parameters is improved. This enables precise location and elimination of energy bottlenecks in the data transport path, thereby reducing chip power consumption. Attached Figure Description

[0015] Figure 1 A schematic diagram of the chip architecture provided in the embodiments of this application;

[0016] Figure 2 A flowchart illustrating the method for determining chip architecture parameters provided in this application embodiment;

[0017] Figure 3 A schematic diagram illustrating the energy consumption calculation principle of the storage unit module provided in this application embodiment;

[0018] Figure 4 A schematic diagram illustrating the working principle of the machine learning proxy model provided in the embodiments of this application;

[0019] Figure 5 A structural block diagram of the device for determining chip architecture parameters provided in the embodiments of this application;

[0020] Figure 6 This is one of the structural block diagrams of the electronic device provided in the embodiments of this application;

[0021] Figure 7 This is a second structural block diagram of the electronic device provided in the embodiments of this application;

[0022] Figure 8 This is a schematic diagram of the hardware structure of the electronic device provided in the embodiments of this application.

[0023] Figure label:

[0024] 100 Chip architecture, 102 System bus interface, 104 Data access controller, 106 Storage subsystem, 108 Storage unit module, 110 Computing engine, 112 Computing array, 114 Global control unit, 116 Vector processing unit. Detailed Implementation

[0025] The technical solutions of the embodiments of this application will be clearly described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of this application. All other embodiments obtained by those skilled in the art based on the embodiments of this application are within the scope of protection of this application.

[0026] The terms "first," "second," etc., used in the specification and claims of this application are used to distinguish similar objects and not to describe a specific order or sequence. It should be understood that such terms can be used interchangeably where appropriate so that embodiments of this application can be implemented in orders other than those illustrated or described herein, and the objects distinguished by "first," "second," etc., are generally of the same class and the number of objects is not limited; for example, a first object can be one or more. Furthermore, in the specification and claims, "and / or" indicates at least one of the connected objects, and the character " / " generally indicates that the preceding and following objects are in an "or" relationship.

[0027] The chip architecture, chip architecture parameter determination method, and related apparatus provided in this application will be described in detail below with reference to the accompanying drawings and through specific embodiments and application scenarios.

[0028] like Figure 1 As shown, this application embodiment provides a chip architecture 100. The chip architecture 100 includes a system bus interface 102, a data access controller 104, a storage subsystem 106, a computing engine 110, a global control unit 114, and a vector processing unit 116.

[0029] The storage subsystem 106 includes multiple storage unit modules 108, and the computing engine 110 includes a computing array 112.

[0030] Specifically, chip architecture 100 can be an NPU chip architecture.

[0031] Optionally, the system bus interface 102 is used for communication between the NPU chip architecture and other main control modules in the on-chip system. The system bus interface 102 is typically connected to AXI (Advanced eXtensible Interface) or AHB (Advanced High-performance Bus), responsible for receiving configuration instructions from the CPU (Central Processing Unit) and for bulk data exchange with external DDR (Double Data Rate) memory.

[0032] Optionally, the data access controller 104 is connected to the system bus interface 102.

[0033] Optionally, the data access controller 104 may specifically be a DMA (Data Access Module) controller.

[0034] Optionally, the DMA controller is used to perform efficient data transfer between external memory and storage subsystem 106. The DMA controller is configured with dedicated read / write channels to mask memory access latency during computation. The data transfer granularity and burst length of the DMA controller are matched to the bit width of the optimized storage cell modules 108 in storage subsystem 106 to maximize bus utilization.

[0035] Optionally, the storage subsystem 106 is connected to the data access controller 104.

[0036] Optionally, the storage subsystem 106 may specifically be a ROCM (Reconfigurable On-Chip Memory).

[0037] Optionally, the ROCM is the core storage unit of the NPU chip architecture, used to temporarily store input feature maps, weights, and output results. The ROCM is not a single large-capacity SRAM macrocell, but rather consists of multiple independently addressable physical banks, i.e., storage unit modules 108 (such as...). Figure 1 A storage array consisting of Bank 0 to Bank M.

[0038] The topology parameters of the multiple storage unit modules 108 are determined based on the energy consumption of the storage unit modules 108 with different topologies.

[0039] Optionally, the topology parameters include at least one of the number, capacity, and physical splicing topology of the storage unit modules 108.

[0040] For example, when the target algorithm frequently performs random access to small amounts of data, ROCM uses a topology with a large number of banks but a small capacity per bank to reduce the activation power consumption of a single access; conversely, it uses a topology with fewer banks but a larger capacity per bank to optimize area efficiency.

[0041] Optionally, the energy consumption of the storage unit module 108 is determined based on process data such as actual process library data.

[0042] Among them, real process library data are data files provided by manufacturers for chip design at specific process nodes, such as memory editor physical library files.

[0043] Optionally, the computing engine 110 is connected to the storage subsystem 106.

[0044] Optionally, the computing engine 110 may be a Scalable Matrix Compute Engine (SMCE), and the computing array 112 may be a Multiply-Accumulate (MAC) array, which is a basic operation unit array containing multiply-accumulate units arranged in a two-dimensional grid.

[0045] Optionally, SMCE is the core computing unit for performing convolution and matrix multiplication.

[0046] The specific values of the number of rows and columns in the MAC array are determined by the computing array optimizer after performing a full design space scan for a specific algorithm load, resulting in the optimal aspect ratio. The optimal aspect ratio maximizes the reuse rate of input data, thereby minimizing the energy consumption cost of data transmission between SMCE and ROCM.

[0047] Specifically, the size parameters of the computing array 112 are determined based on the power consumption of the computing array 112 in different configurations.

[0048] The different configurations of the computing array 112 are manifested in the different key characteristic parameters that affect the power consumption of the computing array 112.

[0049] Optionally, the different configurations of the computing array 112 may specifically be computing arrays with at least one different dimension parameter, computing accuracy, and flip rate.

[0050] The size parameter of the computing array 112 represents the scale of the computing array, specifically the number of rows and columns of the computing array 112, such as 4×4, 8×8, and 16×16. The size of the computing array 112 reflects the number of parallel computing units and the scale of internal interconnection.

[0051] Optionally, the calculation precision refers to the bit width of the operands in the multiplication-addition operation, such as 4bit×4bit, 8bit×8bit, 16bit×16bit, and 32bit×32bit. The calculation precision determines the amount of capacitor switching and current consumption of the calculation array 112 in a single calculation.

[0052] Optionally, the flip rate characterizes the proportion of switching activity of circuit nodes in the computing array 112, that is, the signal flip frequency caused by changes in the input data of the computing array 112. The flip rate is the core factor of the dynamic power consumption of the computing array 112.

[0053] Optionally, the power consumption of the computing array 112 is predicted by a machine learning proxy model.

[0054] Among them, the machine learning agent model can predict the power consumption of the output computing array 112 based on the specific configuration of the computing array 112.

[0055] In practical applications, the machine learning surrogate model can be a multinomial regression model, a random forest regression model, a gradient boosting tree model, or a multilayer perceptron neural network model, without any specific restrictions.

[0056] In practical applications, the preferred machine learning proxy model is the multilayer perceptron neural network model. The multilayer perceptron neural network model can effectively capture nonlinear feature relationships and adaptively model the power consumption sensitivity of different input features.

[0057] Optionally, the global control unit 114 is connected to the system bus interface 102 and the storage subsystem 106, respectively.

[0058] The Global Control Unit 114 is also known as the GCU (Global Control Unit).

[0059] Optionally, the GCU serves as the instruction scheduling center for the entire NPU chip architecture, connected to each submodule via a control bus. The GCU is responsible for parsing the instruction queue and generating enable and address control signals for each module. Based on pre-assessed module idle time, the GCU can independently shut down the clock network of specific SRAM banks or MAC arrays, thereby significantly reducing dynamic power consumption.

[0060] Optionally, the vector processing unit 116 is connected to the computing engine 110 and the storage subsystem 106, respectively.

[0061] The vector processing unit 116 is also known as the VPU (Vector Processing Unit).

[0062] Optionally, the VPU is directly connected to the output of the SMCE to perform vector operations such as nonlinear activation, pooling, and normalization after convolution. The VPU adopts a near-memory computing design, and the processed data is directly written back to the ROCM, reducing the data transport path.

[0063] The chip architecture 100 provided in this embodiment includes a system bus interface 102, a data access controller 104, a storage subsystem 106, a computing engine 110, a global control unit 114, and a vector processing unit 116. The storage subsystem 106 further includes multiple storage unit modules 108, and the computing engine 110 includes a computing array 112. The data access controller 104 is connected to the system bus interface 102, the storage subsystem 106 is connected to the data access controller 104, the computing engine 110 is connected to the storage subsystem 106, the global control unit 114 is connected to both the system bus interface 102 and the storage subsystem 106, and the vector processing unit 116 is connected to both the computing engine 110 and the storage subsystem 106. Notably, the determination of the topology parameters of the storage unit modules 108 and the size parameters of the computing array 112 is independent of each other in terms of chip architecture parameters. Specifically, based on the energy consumption of different topology storage cell modules 108 determined by process data, the topology parameters of storage cell modules 108 are determined from multiple levels, including the number, capacity, and physical assembly topology. This enables refined evaluation and feedback of the topology parameters of storage cell modules 108, improving the accuracy of topology parameter determination and reducing the energy consumption of chip architecture 100. Furthermore, based on the power consumption of computing array 112 with at least one different configuration among size parameters, computational accuracy, and toggle rate predicted by a machine learning surrogate model, the size parameters of computing array 112 are determined, improving the accuracy of size parameter determination and facilitating the reduction of chip architecture 100 power consumption. Thus, by automatically optimizing chip architecture parameters at both the topology parameters of storage cell modules 108 and the size parameters of computing array 112 during the design process of chip architecture 100, the accuracy of chip architecture parameter determination is improved. This allows for precise location and elimination of energy bottlenecks in data transport paths, reducing the power consumption of chip architecture 100.

[0064] like Figure 2 As shown, this application embodiment also provides a method for determining chip architecture parameters, which may specifically include the following steps S202 to S206:

[0065] S202: Obtain boundary constraint information between design goals and the architecture search space.

[0066] The method for determining chip architecture parameters proposed in this application is executed by an electronic device, which may be a smart electronic device such as a smartphone, tablet computer, laptop computer, or desktop computer, and is not specifically limited to these devices.

[0067] The boundary constraint information includes process data and the definition information of the architecture search space.

[0068] Optionally, the definition information is used to constrain the adjustment range of chip architecture parameters.

[0069] The chip architecture parameters may include the topology parameters of the storage unit modules of the storage subsystem in the chip, and may also include the size parameters of the computing array in the chip.

[0070] Based on this, the above-mentioned definition information may specifically include at least one of the following: the search range of the MAC array, the upper limit of the total capacity of the SRAM on the chip, the total bandwidth requirement, and the area constraint.

[0071] The search range for the MAC array can be the range of the number of rows and columns of the MAC array, such as traversing from 4×4 to 16×16. The search range for the MAC array can also be the range of the aspect ratio of the MAC array, without specific restrictions.

[0072] In practical applications, the aforementioned boundary constraint information may also include algorithm load feature information, which includes at least one of the following: the hierarchical structure, operator type, tensor dimension, and sparsity distribution of the DNN (Deep Neural Network). The algorithm load feature information can be used as a test benchmark for architecture optimization.

[0073] In practical applications, the aforementioned boundary constraint information may also include process and environmental constraint information, which includes at least one of the following: the wafer fab's process node, voltage domain, and expected performance targets.

[0074] S204: Determine the energy consumption of memory cell modules with different topologies within the adjustment range based on the process data, and determine the topology parameters of the memory cell modules in the memory subsystem of the chip based on the energy consumption of memory cell modules with different topologies.

[0075] The topology parameters include at least one of the following: the number of storage unit modules, capacity, and physical splicing topology.

[0076] Specifically, in the chip architecture parameter determination method provided in this application embodiment, during the chip architecture parameter determination process, boundary constraint information between the design target and the architecture search space is obtained. This boundary constraint information includes process data and the definition information of the architecture search space used to constrain the adjustment range of the chip architecture parameters. Based on this, for the topology parameters of the memory cell modules in the chip's memory subsystem, the energy consumption of memory cell modules with different topologies within the adjustment range is determined according to the process data. Then, based on the energy consumption of memory cell modules with different topologies, the topology parameters of the memory cell modules in the chip's memory subsystem are determined, such as the number, capacity, and physical splicing topology of the memory cell modules.

[0077] Understandably, when determining the architecture parameters of NPU chips, chip design manufacturers typically treat the NPU memory as an idealized, complete "black box," focusing only on read / write cycles and failing to perceive the characteristics of macrocells in the physical implementation. This results in a lack of guidance for the physical block design of the storage subsystem in NPU chip architecture parameter design, making it difficult for designers to determine whether "a few large-capacity macrocells" or "a large number of small-capacity macrocells" better balances the dynamic read / write power consumption and static leakage power consumption of the chip architecture. Consequently, the lack of refined evaluation feedback in the NPU chip architecture parameter design leads to serious static leakage waste or localized hotspots in the storage unit modules of the storage subsystem due to improper block strategies.

[0078] Furthermore, in related technologies, early chip architecture exploration tools only used general theoretical power consumption parameters without integrating specific process node data interfaces provided by wafer foundries. This caused the NPU chip architecture parameter design to deviate from the actual process library, making the architecture parameters unworkable. Consequently, the evaluation results of the architecture parameters could not reflect the physical characteristics under specific processes, resulting in the architecture parameters determined in the design phase failing to meet timing constraints or area requirements during backend physical implementation, causing repeated design iterations and delays.

[0079] In the chip architecture parameter determination method provided in this application embodiment, an automatic topology generation mechanism for the storage subsystem based on a real process library is proposed for the topology parameters of the storage cell module. Specifically, for the design of the storage cell module of the storage subsystem, i.e., for memory design, this application embodiment abandons the abstract modeling that only focuses on capacity and establishes an architecture generation engine based on the physical library of the memory editor. Users only need to input boundary constraint information such as total bandwidth and total capacity, and the electronic device can automatically traverse hundreds or thousands of SRAM macrocell combinations using the built-in splicing algorithm. Based on the energy consumption of SRAM macrocells in different combinations, it outputs an optimal physical topology structure for the SRAM macrocell, enabling the chip to achieve the global minimum of static leakage current and dynamic flip-flop power consumption at a specific process node. In this way, for the storage cell module of the storage subsystem, the optimal physical splicing topology and block strategy of the memory macrocell can be automatically generated based on the boundary constraint information and process data, solving the problem that manual selection is difficult to balance leakage current and dynamic power consumption.

[0080] Furthermore, by parsing the process data provided by the wafer fab, such as the physical library file of the memory editor, the topology parameters of the memory cell module are determined. The data source is directly anchored to the physical implementation, so that the output topology parameters have the value of directly guiding the back-end physical implementation. This ensures the physical feasibility of the topology parameters and allows the output topology parameters to be directly checked by the back-end physical design rules. This greatly reduces the iteration risk from architecture definition to physical implementation and avoids tape-out failures caused by unrealistic architecture definitions.

[0081] S206: Predicts the power consumption of computing arrays with different configurations within the adjustment range using a machine learning proxy model, and determines the size parameters of the computing array in the chip based on the power consumption of computing arrays with different configurations.

[0082] Optionally, different configurations of the computing array manifest as different key characteristic parameters affecting the power consumption of the computing array. Specifically, different configurations of computing arrays may be computing arrays with at least one different parameter, such as size, computing accuracy, or flip-flop rate.

[0083] Optionally, the power consumption predicted by the machine learning surrogate model is the dynamic power consumption value of the computing array at the reference frequency. In determining the size parameters of the computing array in the chip, for different configurations of the computing array, in addition to the dynamic power consumption value predicted by the machine learning surrogate model, the static power consumption term of the computing array can also be added to obtain the total power consumption of the computing array.

[0084] Specifically, in the chip architecture parameter determination method provided in the embodiments of this application, during the process of determining chip architecture parameters, for the size parameter of the computing array, after obtaining the boundary constraint information of the design target and the architecture search space, the power consumption of computing arrays with different configurations within the adjustment range is predicted by a machine learning proxy model, and then the size parameter of the computing array in the chip is determined based on the power consumption of computing arrays with different configurations.

[0085] Understandably, the size of the MAC array directly affects data reuse rate and on-chip bus load balancing. However, existing methods for determining NPU chip architecture parameters often rely on designer experience and linear interpolation-based estimation methods to blindly select the MAC array. This fails to reflect the complex nonlinear relationship between the MAC array size and data reuse rate, as well as the on-chip bus load. In other words, it cannot lock in the optimal computing array geometry through nonlinear prediction, resulting in a MAC array size that has low utilization under specific algorithm loads or may cause interconnect congestion, failing to achieve the theoretically optimal energy efficiency ratio.

[0086] Furthermore, to obtain high-precision data, existing methods for determining NPU chip architecture parameters primarily rely on register-transfer level simulation. However, this process is extremely slow, forcing design teams to verify only a very limited number of architecture configurations within a limited project timeframe. This forces designers to abandon a broad search of the vast design space, limiting their exploration and causing them to miss "globally optimal architectural solutions" achievable only through specific architectural parameters, making it difficult to discover the globally optimal architecture.

[0087] In the chip architecture parameter determination method provided in this application embodiment, a computing array geometry configuration decision mechanism based on a machine learning agent model is proposed for the computing array size parameter. Specifically, for the computing array, this application embodiment uses a machine learning agent model as an architecture search engine. Based on the energy efficiency differences of the computing array under different configurations, it actively recommends the optimal computing array configuration, ensuring that the generated hardware architecture has the theoretically optimal energy efficiency ratio under algorithm load.

[0088] Furthermore, by replacing traditional circuit-level simulation with machine learning proxy models, the speed of architecture evaluation can be increased by several orders of magnitude. This allows design teams to exhaustively scan hundreds or thousands of potential combinations of architecture parameters within a limited project cycle, greatly expanding the breadth and depth of design space exploration, facilitating the discovery of the globally optimal architecture, and thus significantly enhancing the market competitiveness of chip architecture.

[0089] Furthermore, the chip architecture parameter determination method provided in this application constructs a decoupled architecture optimization framework that supports hardware and software co-design. Specifically, this application explicitly decomposes the NPU architecture design into the generation of storage unit modules and the optimization of the computing array. Based on independent parameter configuration interfaces, designers are allowed to explore the storage level and computing scale separately. While separately calculating the energy consumption of the storage unit modules and the power consumption of the computing array, independent tuning of module-level architecture parameters is achieved. For example, while keeping the computing array unchanged, a set of optimal storage unit module configurations can be traversed and output separately, thereby assisting designers in accurately locating and eliminating energy consumption bottlenecks on the data transport path.

[0090] The chip architecture parameter determination method provided in this application no longer relies on a single fixed hardware description, but rather on variable boundary constraint information to determine the chip architecture parameters, namely the topology parameters of the storage unit modules in the chip's storage subsystem and the size parameters of the computing array in the chip. The determination of the topology parameters of the storage unit modules and the size parameters of the computing array are independent of each other. Specifically, based on the energy consumption of storage unit modules with different topologies determined by process data, the topology parameters of the storage unit modules are determined from multiple levels, including the number, capacity, and physical assembly topology of the storage unit modules. This achieves refined evaluation and feedback of the topology parameters of the storage unit modules, reducing chip energy consumption. Furthermore, the topology parameters determined based on process data can meet the constraints of the backend physical implementation, eliminating the need for repeated design and improving the accuracy and efficiency of topology parameter determination. Moreover, based on the power consumption prediction of computing arrays with at least one different configuration among size parameters, computational accuracy, and toggle rate using a machine learning surrogate model, the size parameters of the computing array are determined, which can improve the accuracy and efficiency of size parameter determination while reducing chip power consumption. In this way, during the determination of chip architecture parameters, automatic optimization of chip architecture parameters is achieved from two levels: the topology parameters of the storage unit module and the size parameters of the computing array. This improves the accuracy and efficiency of determining chip architecture parameters, and facilitates the accurate location and elimination of energy bottlenecks in the data transport path, thereby reducing chip power consumption.

[0091] In this embodiment of the application, the step of determining the energy consumption of memory cell modules with different topologies within the adjustment range based on process data may specifically include the following steps S208 to S212:

[0092] S208: Construct storage unit modules with different topologies within the adjustment range.

[0093] Each storage unit module in each topology meets the capacity, bandwidth, and latency requirements defined by the information constraints.

[0094] Specifically, in the chip architecture parameter determination method provided in this application embodiment, during the process of determining the topology parameters of the storage unit module, if the capacity of a single storage unit module is insufficient, multiple storage unit modules are automatically invoked and horizontally or vertically spliced within an adjustment range to construct storage unit modules with different topologies. During the splicing of storage unit modules, read / write access distribution and timing constraints must be considered to ensure that the spliced storage unit modules meet bandwidth and latency requirements, thus adapting to the storage needs of different NPU architectures.

[0095] S210: Based on the process data, calculate the read / write power consumption, clock tree power consumption, and static leakage power consumption of each memory cell module topology.

[0096] Among them, read and write power consumption is related to the voltage, current, frequency and access frequency of the storage unit module.

[0097] Optionally, the read / write energy consumption of the storage unit module can be dynamically calculated by statistically analyzing the actual number of accesses during the execution of the deep neural network, thereby determining the access energy consumption of different layers and modules. The actual number of accesses is determined based on the mapping scheme and tensor scheduling of the deep neural network.

[0098] In practical applications, such as Figure 3 As shown, the read / write power consumption E of the storage unit module read / write Specifically, it can be determined using the following formula:

[0099] E read / write =V×I read / write ×f×N read / write ;

[0100] Where V is the operating voltage corresponding to the process corner in the process library, and I read / write The read / write current of the memory at unit frequency, in µA / MHz, I read / write It can be determined based on process data, specifically obtained from a memory power consumption reference lookup table, where f is the operating frequency and N is the operating frequency. read / write The number of read and write operations performed on the memory during the evaluation cycle can be determined based on the memory's workload and mapping statistics.

[0101] Alternatively, clock tree power consumption is mainly related to the switching and gating states of the memory's clock network.

[0102] Optionally, clock tree energy consumption is used to characterize the energy consumption differences brought about by clock distribution networks and gating mechanisms, and is especially suitable for scenarios with multi-level storage structures.

[0103] In practical applications, such as Figure 3 As shown, the clock tree power consumption E of the storage unit module clockSpecifically, it can be determined using the following formula:

[0104] E clock =G clock ×P clock ×t;

[0105] Among them, P clock The clock tree power consumption obtained through PTPX simulation in idle state is related to V and f, G clock This is a clock gating signal, where 0 indicates off and 1 indicates on, and t is the running time, which can be determined based on the memory's workload and mapping statistics.

[0106] Alternatively, static leakage current reflects the energy consumption characteristics of the memory in low utilization or standby mode, which is particularly critical for low-power designs.

[0107] In practical applications, such as Figure 3 As shown, the static leakage power E of the storage unit module leakage Specifically, it can be determined using the following formula:

[0108] E leakage =V×I leakage ×t;

[0109] Among them, I leakage I represents the leakage current of the memory under specific process corner conditions, expressed in µA / MHz. leakage It can be determined based on process data, and specifically obtained from the memory power consumption reference lookup table.

[0110] S212: The sum of the read / write power consumption, clock tree power consumption, and static leakage power consumption of each topology storage cell module is determined as the power consumption of each topology storage cell module.

[0111] Specifically, in the chip architecture parameter determination method provided in this application embodiment, based on the hardware architecture description file (i.e., boundary constraint information) and memory access statistics during memory operation input by the user, process data is automatically called for splicing, filtering, and calculation to obtain the read / write power consumption, clock tree power consumption, and static leakage power consumption of each topology memory cell module. Then, the sum of the read / write power consumption, clock tree power consumption, and static leakage power consumption of each topology memory cell module is determined as the power consumption of each topology memory cell module.

[0112] That is, in the method for determining chip architecture parameters provided in the embodiments of this application, such as Figure 3 As shown, the total energy consumption of the memory unit module, i.e., the memory, for each topology can be determined by the following formula:

[0113] E memory =E read / write +Eclock +E leakage ;

[0114] Among them, E memory This refers to the total energy consumption of the storage unit module, i.e., the memory.

[0115] In this way, based on the characteristic parameters of memory macrocells in the process data, combined with the memory access behavior statistics and energy consumption decomposition formula, dynamic prediction of memory energy consumption with different capacities, bandwidths and access methods can be achieved, covering the main energy consumption sources in memory operation, and integration and summation can be performed at the clock cycle level or task level time scale.

[0116] The embodiments provided in this application construct memory cell modules with different topologies within an adjustment range. Each memory cell module with a different topology meets the capacity, bandwidth, and latency requirements defined by the information constraints. Based on process data, the read / write power consumption, clock tree power consumption, and static leakage power consumption of each memory cell module with a different topology are calculated. The sum of the read / write power consumption, clock tree power consumption, and static leakage power consumption of each memory cell module with a different topology is determined as the power consumption of each memory cell module with a different topology. In this way, in the early stages of chip architecture design, accurate and comprehensive power consumption analysis is achieved based on process data, which facilitates the reduction of chip power consumption and can also reduce the iteration costs in the later stages of chip physical implementation.

[0117] In this embodiment, the step of determining the topology parameters of the memory cell modules in the chip's memory subsystem based on the energy consumption of memory cell modules with different topologies may specifically include the following steps S214 to S218:

[0118] S214: Establish a multi-objective optimization model.

[0119] The optimization objectives of the multi-objective optimization model include at least the energy consumption index, area index, and latency index of the storage unit module.

[0120] S216: Using a multi-objective optimization model, based on the Pareto optimal selection mechanism, storage unit modules with different topologies are selected to obtain the storage unit module with the lowest target topology.

[0121] Specifically, in the chip architecture parameter determination method provided in this application embodiment, for the topology parameters of the storage unit module, after determining the energy consumption of the storage unit module for each topology, a multi-objective optimization model is established, with optimization objectives including at least the energy consumption index, area index, and latency index of the storage unit module. Then, using the multi-objective optimization model, a Pareto optimal selection mechanism is used to screen storage unit modules with different topologies to obtain the storage unit module with the lowest energy consumption and that meets performance constraints. For example, the target topology is a topology consisting of two physical banks (independent and addressable physical memory array units in the storage subsystem) composed of eight 128KB SRAMs, where each physical bank is a 512KB storage block with a constant depth, composed of four 128KB SRAMs connected in parallel.

[0122] In other words, the method for determining chip architecture parameters provided in this application includes three stages for determining the topology parameters of the memory cell module: physical-level splicing generation, energy consumption screening, and optimal unlocking. Specifically, in the physical-level splicing generation stage, based on a real-world process library, various SRAM macrocell splicing combination schemes are automatically generated, i.e., memory cell modules with different topologies are constructed. In the energy consumption screening stage, for each topology memory cell module, the read / write energy consumption, clock tree energy consumption, and static leakage power consumption under a specific algorithm load are calculated to obtain the total energy consumption of each topology memory cell module. In the optimal unlocking stage, based on the access frequency and address distribution of each topology memory cell module, the combination of the number, capacity, and physical splicing topology of memory cell modules that minimizes the total energy consumption is automatically selected as the recommended target topology.

[0123] In the optimal unlocking phase, energy efficiency prediction and bottleneck identification are achieved based on different topology storage unit modules. Specifically, for each topology storage unit module, the PPA (Power Performance Average) prediction is calculated. Then, through module-level power consumption ratio analysis, it is determined whether the energy efficiency bottleneck of the current topology storage unit module is limited by memory access bandwidth or computing density. By judging whether each topology storage unit module meets energy consumption and performance constraints, the topology of the storage unit module is iteratively optimized until the target topology storage unit module with the lowest energy consumption and meeting performance constraints is obtained.

[0124] S218: The number, capacity, and physical splicing topology of storage unit modules in the target topology are determined as the topology parameters of the storage unit modules.

[0125] Specifically, in the chip architecture parameter determination method provided in this application embodiment, after determining the memory cell modules of the target topology structure that have the lowest energy consumption and meet performance constraints, the number, capacity, and physical splicing topology of the memory cell modules in the target topology structure are determined as the topology parameters of the memory cell modules. This enables the rapid evaluation and selection of the optimal memory configuration scheme based on different constraints.

[0126] In addition, in practical applications, after screening storage unit modules with different topologies, Pareto optimal curves can be output and displayed. The Pareto optimal curves show the best performance points of storage unit modules under different energy consumption budgets, thereby providing visual design guidance and assisting designers in making final decisions.

[0127] The embodiments provided in this application establish a multi-objective optimization model. The optimization objectives of the multi-objective optimization model include at least the energy consumption, area, and latency indicators of the storage unit module. Using the multi-objective optimization model, a Pareto optimal selection mechanism is used to screen storage unit modules with different topologies to obtain the storage unit module with the lowest energy consumption target topology. The number, capacity, and physical assembly topology of the storage unit modules in the target topology are determined as the topology parameters of the storage unit module. Thus, based on the Pareto optimal selection mechanism, the optimal topology of the storage unit module is selected from multiple levels, including energy consumption, area, and latency performance, ensuring that the determined topology parameters perfectly meet the design requirements. This allows the designed chip to balance energy consumption, area cost, and latency performance.

[0128] In this embodiment of the application, S206 may specifically include the following S206a to S206d:

[0129] S206a: Construct computational arrays with different size parameters, computational accuracy, and flip rate within an adjustable range.

[0130] The size of the computing array represents the scale of the computing array, which can be the number of rows and columns of the computing array, such as 4×4, 8×8, and 16×16.

[0131] S206b: Utilizing a machine learning proxy model, it outputs the power consumption of computing arrays with different configurations based on the mapping relationship between the size parameters, computing accuracy, flip-flop rate, and power consumption of the computing array.

[0132] Specifically, in the chip architecture parameter determination method provided in the embodiments of this application, during the process of determining the chip architecture parameters, for the size parameter of the computing array, computing arrays with different size parameters, computing precision and flip-flop rate are constructed within the adjustment range. Then, a machine learning proxy model is used to output the power consumption of computing arrays with different configurations based on the mapping relationship between the size parameter, computing precision, flip-flop rate and power consumption of the computing array.

[0133] Among them, the machine learning agent model can output the corresponding power consumption in milliseconds based on the size parameters, computational accuracy and flip rate of the input computing array, which can significantly improve the speed of power consumption assessment.

[0134] S206c: Based on the power consumption of computing arrays with different configurations, determine the energy efficiency of computing arrays with different configurations, and identify the target computing array with the highest energy efficiency.

[0135] Specifically, in the chip architecture parameter determination method provided in this application embodiment, after the machine learning agent model outputs the power consumption of computing arrays with different configurations, the energy efficiency of computing arrays with different configurations is determined based on the power consumption of computing arrays with different configurations, and the target computing array with the highest energy efficiency is determined. For example, the target computing array is a rectangular array of 32 rows × 64 columns to adapt to the convolution kernel features of deep neural networks.

[0136] That is, in the chip architecture parameter determination method provided in this application embodiment, the process of determining the size parameters of the computing array includes three stages: nonlinear feature fitting, configuration optimization, and hotspot identification. Specifically, in the nonlinear feature fitting stage, a pre-trained machine learning proxy model is used as the architecture search engine to address the nonlinear power consumption characteristics of the computing array under different configurations; in the configuration optimization stage, the number of rows, columns, and computational accuracy of the computing array are input as variables into the machine learning proxy model to quickly scan the entire search space; in the hotspot identification stage, combined with the flip-flop rate input, the machine learning proxy model outputs the power consumption of computing arrays with different configurations and identifies high-power hotspots under specific configurations, eliminating energy-inefficient array size options.

[0137] In the hotspot identification stage, the configuration of the computing array is iteratively optimized based on the energy efficiency prediction and bottleneck location of different configurations of computing arrays until the target computing array with the highest energy efficiency is obtained.

[0138] S206d: Determine the size of the target computing array as the size parameter of the computing array in the chip.

[0139] Specifically, in the chip architecture parameter determination method provided in this application embodiment, after determining the target computing array with the highest energy efficiency, the size of the target computing array is determined as the size parameter of the computing array in the chip. In this way, by using a machine learning agent model as an architecture search engine, the optimal number of rows and columns and the configuration of computing precision are actively recommended, ensuring that the generated hardware architecture has the theoretically optimal energy efficiency ratio under algorithm load.

[0140] The embodiments provided in this application construct computing arrays with different size parameters, computational precision, and flip-flop rates within an adjustment range. Using a machine learning proxy model, based on the mapping relationship between the size parameters, computational precision, flip-flop rate, and power consumption of the computing arrays, the power consumption of different configurations of the computing arrays is output. Based on the power consumption of the different configurations of the computing arrays, the energy efficiency of the different configurations is determined, and the target computing array with the highest energy efficiency is identified. The size of the target computing array is determined as the size parameter of the computing array in the chip. In this way, by using the power consumption of different configurations of computing arrays predicted by the machine learning proxy model, the energy efficiency of different configurations of computing arrays is evaluated, and the size parameters of the computing array in the chip are determined. This facilitates reducing chip power consumption and improving chip energy efficiency. Furthermore, the machine learning proxy model can complete the power consumption prediction of a configuration within milliseconds, enabling extremely fast design space exploration and improving the efficiency of determining the size parameters of the computing array.

[0141] In this embodiment of the application, before predicting the power consumption of computing arrays with different configurations within the adjustment range using a machine learning proxy model, the method for determining the chip architecture parameters may further include the following steps S220 to S226:

[0142] S220: Obtain the training dataset.

[0143] The training dataset includes the size parameters, computational accuracy, flip rate, and simulated power consumption of computing arrays with different configurations.

[0144] The dimensions of the computing array include, but are not limited to: 4×4, 8×8, 12×12, and 16×16.

[0145] Optionally, the computational precision of the computation array can range from 4bit×4bit to 32bit×32bit.

[0146] Optionally, the flip rate of the array can be calculated to range from 10% to 100%, with a step size of 10%.

[0147] Optionally, the simulated power consumption of the computing array can specifically be the average power consumption value of the switching power consumption, internal power consumption, and leakage power consumption of the computing array operating under various configurations.

[0148] In practical applications, the training dataset can be stored in tabular form. The table's contents include, but are not limited to, the following: the size parameters of the computing array, computing accuracy, process node, supply voltage, temperature, frequency, clock gating, input, flip-flop rate, flip-flop power consumption, internal power consumption, leakage power consumption, total power consumption, and area. No specific restrictions are imposed here.

[0149] The training dataset obtained in the above manner not only covers typical array configurations, but also allows machine learning proxy models to capture multi-dimensional nonlinear relationships.

[0150] Specifically, in the method for determining chip architecture parameters provided in the embodiments of this application, such as Figure 4 As shown, the working framework of the machine learning agent model may specifically include a dataset construction module. Through the dataset construction module, a training dataset is constructed based on the size parameters, computational accuracy, flip rate and simulation power consumption of different configured computing arrays.

[0151] In addition, in practical applications, such as Figure 4 As shown, the working framework of the machine learning agent model may also include a feature processing and normalization module. After obtaining the training dataset, the feature processing and normalization module preprocesses the training dataset. Specifically, it normalizes the size parameters, computational accuracy, and flip rate of the computing array to eliminate order-of-magnitude differences, and performs a logarithmic transformation on the simulated power consumption of the computing array to smooth the distribution.

[0152] S222: Divide the training dataset into a training sample set and a validation sample set.

[0153] The number of samples in the training sample set can account for 80% of the total number of samples in the training dataset, and the number of samples in the validation sample set can account for 20% of the total number of samples in the training dataset. Those skilled in the art can set the proportion of the number of samples in the training sample set and the validation sample set according to the actual situation, and no specific restrictions are imposed here.

[0154] S224: Iteratively train the machine learning agent model based on the training sample set and the target loss function.

[0155] The target loss function can be the minimum mean square error function or other loss functions, without specific restrictions here.

[0156] Specifically, such as Figure 4 As shown, the working framework of the machine learning agent model may further include a training module. Through the training module, the training sample set is used as input, the fitted model parameters are used as output, and the machine learning agent model of a specified model type is trained based on the target loss function.

[0157] S226: Determine the prediction error of the trained machine learning agent model based on the validation sample set, and determine whether the trained machine learning agent model passes the validation if the prediction error is less than the error threshold.

[0158] The error threshold is a relatively small value. Those skilled in the art can set the specific value of the error threshold according to the actual situation, and no specific restrictions are imposed here.

[0159] Specifically, in the chip architecture parameter determination method provided in this application embodiment, for the machine learning proxy model, a training dataset including the size parameters, computational accuracy, flip-flop rate, and simulated power consumption of computing arrays with different configurations is obtained, and the training dataset is divided into a training sample set and a validation sample set. Based on this, the machine learning proxy model is iteratively trained according to the training sample set and the target loss function, and the prediction error of the trained machine learning proxy model is determined according to the validation sample set. If the prediction error is less than the error threshold, the trained machine learning proxy model is determined to have passed validation; otherwise, the machine learning proxy model is retrained.

[0160] In this way, by learning from simulation samples with different configurations, a regression mapping relationship between the input parameters of the machine learning proxy model and the power output is established, thereby quickly predicting the power consumption of the computing array under any configuration without the need for resimulation.

[0161] In other words, by pre-training the machine learning proxy model, the machine learning proxy model can capture the nonlinear relationship between array size parameters, data flow characteristics and interconnection congestion. This makes it easier to use the machine learning proxy model to quickly scan within the design space and proactively recommend the array row and column configuration with the highest energy efficiency, thus solving the problem of low computing power utilization caused by setting array size based on experience in traditional design.

[0162] Furthermore, by calibrating through simulation data to pre-train the machine learning proxy model, and with the data source directly anchored to the physical implementation, the subsequent output size parameters of the machine learning proxy model have direct guiding value for the backend physical implementation. This ensures the physical feasibility of the size parameters, allowing them to be directly checked by the backend physical design rules. This greatly reduces the iterative risk from architecture definition to physical implementation and avoids tape-out failures caused by unrealistic architecture definitions.

[0163] Based on this, such as Figure 4 As shown, the working framework of the machine learning agent model may also include a prediction module. Through the prediction module, the machine learning agent model can output the corresponding power consumption based on the input size parameters, computational accuracy, and flip rate.

[0164] The embodiments provided in this application, before predicting the power consumption of computing arrays with different configurations within an adjustment range using a machine learning proxy model, acquire a training dataset. The training dataset includes the size parameters, computational accuracy, flip-flop rate, and simulated power consumption of computing arrays with different configurations. The training dataset is divided into a training sample set and a validation sample set. The machine learning proxy model is iteratively trained based on the training sample set and the target loss function. The prediction error of the trained machine learning proxy model is determined based on the validation sample set, and if the prediction error is less than an error threshold, the trained machine learning proxy model is deemed to have passed validation. In this way, training the machine learning proxy model based on real simulation data of computing arrays with different configurations enables the machine learning proxy model to accurately learn the mapping relationship between the size parameters, computational accuracy, flip-flop rate, and power consumption of the computing array, thereby improving the accuracy of the machine learning proxy model in predicting the power consumption of computing arrays with different configurations and improving the accuracy of the subsequently determined size parameters of the computing array.

[0165] In summary, this application provides an automatic generation and optimization scheme for mobile NPU architecture parameters based on a power consumption feedback closed loop. It proposes an architecture generation mechanism based on "process data + machine learning proxy model". In the early stage of chip definition, by automatically searching and optimizing in the entire design space, it directly outputs the physical block strategy of the storage subsystem and the geometric configuration of the computing array that can minimize system power consumption. This achieves a technological leap from "passive power consumption assessment" to "active architecture design" and realizes the precise definition of the NPU physical architecture.

[0166] Specifically, this application provides a refined architecture optimization method for module decoupling. It employs a decoupled modeling architecture to independently display and optimize data transfer power consumption (determined by the storage architecture) and computational power consumption (determined by the array architecture). Furthermore, through visualized energy efficiency bottleneck analysis, designers are no longer faced with a vague total power consumption value, but can accurately identify the system's energy efficiency shortcomings. This allows for adjustments to the on-chip bus width or SRAM cache level, achieving true hardware-software co-design, and ultimately outputting a design result containing detailed hardware specifications.

[0167] Based on this, the system framework for determining chip architecture parameters provided in this application embodiment specifically includes a constraint definition layer, an architecture optimization and modeling layer, and an architecture parameter generation layer. The constraint definition layer no longer receives a single, fixed hardware description, but instead receives boundary constraint information between the design objective and the architecture search space. The architecture optimization and modeling layer is the core decision-making module, used to find the energy-efficient architecture solution while satisfying constraints through virtualization modeling and iterative search mechanisms. Specifically, the architecture optimization and modeling layer includes an automatic storage architecture generation and evaluation engine and a computational array optimizer based on a machine learning proxy model. The automatic storage architecture generation and evaluation engine determines the topology parameters of the optimal storage unit module, and the computational array optimizer determines the size parameters of the optimal computational array. Optionally, the architecture parameter generation layer is responsible for outputting an optimal hardware specification list, system energy efficiency prediction and bottleneck location, and visual design guidance. Through the above system framework, a complete closed-loop design process from "constraint input" to "automatic architecture optimization" and then to "hardware specification generation" is realized.

[0168] In other words, this application provides a one-stop architecture specification generation and DSE (Design Space Exploration) platform, that is, a design system capable of automatically generating hardware specifications. By integrating the aforementioned chip architecture parameter determination mechanism into a unified automated platform, the design system possesses full design space exploration capabilities. Furthermore, the design system supports batch scanning of voltage domain, frequency, and hardware scale variables, and automatically generates the Pareto optimal frontier. The design system can not only output energy efficiency curves but also directly output a hardware design blueprint containing an optimal SRAM selection list and MAC array specifications, enabling designers to complete the entire process from requirement definition to core architecture parameter locking within milliseconds.

[0169] In other words, in this embodiment, during the chip design phase, the deep neural network model parameters and process constraints are input into the aforementioned design system. Then, the optimal ROCM block parameters are generated using real process data, and a machine learning surrogate model is used to predict the MAC array specifications with the highest energy efficiency. Based on this, the hardware architecture of the NPU chip is generated using the ROCM block parameters and MAC array specifications generated by the design system.

[0170] The chip architecture parameter determination method provided in this application can be executed by a chip architecture parameter determination device. This application uses an example of a chip architecture parameter determination device executing the above-described chip architecture parameter determination method to illustrate the chip architecture parameter determination device provided in this application.

[0171] like Figure 5As shown in the figure, this application embodiment provides a chip architecture parameter determination device 500, which may specifically include the acquisition unit 502 and the processing unit 504 described below.

[0172] The acquisition unit 502 is used to acquire the boundary constraint information between the design target and the architecture search space. The boundary constraint information includes process data and the definition information of the architecture search space. The definition information is used to constrain the adjustment range of the chip architecture parameters. The chip architecture parameters include the topology parameters of the storage cell module of the storage subsystem in the chip and the size parameters of the computing array in the chip.

[0173] The processing unit 504 is used to determine the energy consumption of memory cell modules with different topologies within the adjustment range based on process data, and to determine the topology parameters of the memory cell modules in the memory subsystem of the chip based on the energy consumption of the memory cell modules with different topologies. The topology parameters include the number, capacity and physical splicing topology of the memory cell modules.

[0174] The processing unit 504 is also used to predict the power consumption of computing arrays with different configurations within the adjustment range through a machine learning proxy model, and to determine the size parameters of the computing arrays in the chip based on the power consumption of the computing arrays with different configurations. The computing arrays with different configurations are computing arrays with at least one different value in terms of size parameters, computing accuracy and toggle rate.

[0175] The chip architecture parameter determination apparatus 500 provided in this application determines chip architecture parameters not based on a single fixed hardware description, but based on variable boundary constraint information, namely the topology parameters of the storage unit modules in the chip's storage subsystem and the size parameters of the computing array in the chip. The determination of the topology parameters of the storage unit modules and the size parameters of the computing array are independent of each other. Specifically, based on the energy consumption of storage unit modules with different topologies determined by process data, the topology parameters of the storage unit modules are determined from multiple levels, including the number, capacity, and physical assembly topology of the storage unit modules. This achieves refined evaluation and feedback of the topology parameters of the storage unit modules, reducing chip energy consumption. Furthermore, the topology parameters determined based on process data can meet the constraints of the backend physical implementation, eliminating the need for repeated design and improving the accuracy and efficiency of topology parameter determination. Moreover, based on the power consumption of computing arrays with at least one different configuration among size parameters, computational accuracy, and toggle rate predicted by a machine learning surrogate model, the size parameters of the computing array are determined, which can improve the accuracy and efficiency of size parameter determination while reducing chip power consumption. In this way, during the determination of chip architecture parameters, automatic optimization of chip architecture parameters is achieved from two levels: the topology parameters of the storage unit module and the size parameters of the computing array. This improves the accuracy and efficiency of determining chip architecture parameters, and facilitates the accurate location and elimination of energy bottlenecks in the data transport path, thereby reducing chip power consumption.

[0176] In this embodiment, the processing unit 504 is specifically used to: construct storage unit modules with different topologies within the adjustment range, wherein each storage unit module with different topologies meets the capacity, bandwidth, and latency requirements of the defined information constraints; calculate the read / write power consumption, clock tree power consumption, and static leakage power consumption of each storage unit module with different topologies based on process data; and determine the sum of the read / write power consumption, clock tree power consumption, and static leakage power consumption of each storage unit module with different topologies as the power consumption of each storage unit module with different topologies.

[0177] The embodiments provided in this application construct memory cell modules with different topologies within an adjustment range. Each memory cell module with a different topology meets the capacity, bandwidth, and latency requirements defined by the information constraints. Based on process data, the read / write power consumption, clock tree power consumption, and static leakage power consumption of each memory cell module with a different topology are calculated. The sum of the read / write power consumption, clock tree power consumption, and static leakage power consumption of each memory cell module with a different topology is determined as the power consumption of each memory cell module with a different topology. In this way, in the early stages of chip architecture design, accurate and comprehensive power consumption analysis is achieved based on process data, which facilitates the reduction of chip power consumption and can also reduce the iteration costs in the later stages of chip physical implementation.

[0178] In this embodiment, the processing unit 504 is specifically used to: establish a multi-objective optimization model, wherein the optimization objectives of the multi-objective optimization model include at least the energy consumption index, area index, and latency index of the storage unit module; use the multi-objective optimization model to screen storage unit modules with different topologies based on the Pareto optimal screening mechanism to obtain the storage unit module with the lowest energy consumption target topology; and determine the number, capacity, and physical splicing topology of the storage unit modules in the target topology as the topology parameters of the storage unit module.

[0179] The embodiments provided in this application establish a multi-objective optimization model. The optimization objectives of the multi-objective optimization model include at least the energy consumption, area, and latency indicators of the storage unit module. Using the multi-objective optimization model, a Pareto optimal selection mechanism is used to screen storage unit modules with different topologies to obtain the storage unit module with the lowest energy consumption target topology. The number, capacity, and physical assembly topology of the storage unit modules in the target topology are determined as the topology parameters of the storage unit module. Thus, based on the Pareto optimal selection mechanism, the optimal topology of the storage unit module is selected from multiple levels, including energy consumption, area, and latency performance, ensuring that the determined topology parameters perfectly meet the design requirements. This allows the designed chip to balance energy consumption, area cost, and latency performance.

[0180] In this embodiment, the processing unit 504 is specifically used to: construct computing arrays with different size parameters, computational precision, and flip-flop rate within an adjustment range; utilize a machine learning proxy model to output the power consumption of computing arrays with different configurations based on the mapping relationship between the size parameters, computational precision, flip-flop rate, and power consumption of the computing arrays; determine the energy efficiency of computing arrays with different configurations based on the power consumption of the computing arrays with different configurations, and determine the target computing array with the highest energy efficiency; and determine the size of the target computing array as the size parameter of the computing array in the chip.

[0181] The embodiments provided in this application construct computing arrays with different size parameters, computational precision, and flip-flop rates within an adjustment range. Using a machine learning proxy model, based on the mapping relationship between the size parameters, computational precision, flip-flop rate, and power consumption of the computing arrays, the power consumption of different configurations of the computing arrays is output. Based on the power consumption of the different configurations of the computing arrays, the energy efficiency of the different configurations is determined, and the target computing array with the highest energy efficiency is identified. The size of the target computing array is determined as the size parameter of the computing array in the chip. In this way, by using the power consumption of different configurations of computing arrays predicted by the machine learning proxy model, the energy efficiency of different configurations of computing arrays is evaluated, and the size parameters of the computing array in the chip are determined. This facilitates reducing chip power consumption and improving chip energy efficiency. Furthermore, the machine learning proxy model can complete the power consumption prediction of a configuration within milliseconds, enabling extremely fast design space exploration and improving the efficiency of determining the size parameters of the computing array.

[0182] In this embodiment, the processing unit 504 is further configured to: acquire a training dataset, the training dataset including the size parameters, computational accuracy, flip rate and simulation power consumption of computing arrays with different configurations; divide the training dataset into a training sample set and a validation sample set; iteratively train the machine learning agent model according to the training sample set and the target loss function; determine the prediction error of the trained machine learning agent model according to the validation sample set, and determine that the trained machine learning agent model passes the validation if the prediction error is less than the error threshold.

[0183] The embodiments provided in this application, before predicting the power consumption of computing arrays with different configurations within an adjustment range using a machine learning proxy model, acquire a training dataset. The training dataset includes the size parameters, computational accuracy, flip-flop rate, and simulated power consumption of computing arrays with different configurations. The training dataset is divided into a training sample set and a validation sample set. The machine learning proxy model is iteratively trained based on the training sample set and the target loss function. The prediction error of the trained machine learning proxy model is determined based on the validation sample set, and if the prediction error is less than an error threshold, the trained machine learning proxy model is deemed to have passed validation. In this way, training the machine learning proxy model based on real simulation data of computing arrays with different configurations enables the machine learning proxy model to accurately learn the mapping relationship between the size parameters, computational accuracy, flip-flop rate, and power consumption of the computing array, thereby improving the accuracy of the machine learning proxy model in predicting the power consumption of computing arrays with different configurations and improving the accuracy of the subsequently determined size parameters of the computing array.

[0184] The chip architecture parameter determination device 500 in this application embodiment can be an electronic device or a component in an electronic device, such as an integrated circuit or a chip. The electronic device can be a terminal or other devices besides a terminal. For example, the electronic device can be a mobile phone, tablet computer, laptop computer, handheld computer, in-vehicle electronic device, mobile internet device (MID), augmented reality (AR) / virtual reality (VR) device, robot, wearable device, ultra-mobile personal computer (UMPC), netbook or personal digital assistant (PDA), etc. It can also be a server, network attached storage (NAS), personal computer (PC), television (TV), ATM or self-service machine, etc. The embodiments of this application do not specifically limit it.

[0185] The chip architecture parameter determination device 500 in this application embodiment can be a device with an operating system. The operating system can be Android, iOS, or other possible operating systems; this application embodiment does not specifically limit it.

[0186] The chip architecture parameter determination device 500 provided in this application embodiment can achieve Figure 2 The various processes implemented in the method implementation examples will not be described again here to avoid repetition.

[0187] Optionally, such as Figure 6 As shown in the figure, this application embodiment also provides an electronic device 300, including the chip architecture 100 in the above embodiment.

[0188] The electronic device 300 provided in this application includes the chip architecture 100 in the above embodiments. Therefore, the electronic device 300 provided in this application has all the beneficial effects of the chip architecture 100 in the above embodiments, and will not be described again here to avoid repetition.

[0189] Optionally, such as Figure 7As shown, this application embodiment also provides an electronic device 600, including a processor 602 and a memory 604. The memory 604 stores a program or instructions that can run on the processor 602. When the program or instructions are executed by the processor 602, they implement the various steps of the above-mentioned chip architecture parameter determination method embodiment and can achieve the same technical effect. To avoid repetition, they will not be described again here.

[0190] It should be noted that the electronic devices in the embodiments of this application include the aforementioned mobile electronic devices and non-mobile electronic devices.

[0191] Figure 8 A schematic diagram of the hardware structure of an electronic device to implement an embodiment of this application.

[0192] Electronic device 700 includes, but is not limited to: radio frequency unit 701, network module 702, audio output unit 703, input unit 704, sensor 705, display unit 706, user input unit 707, interface unit 708, memory 709, and processor 710, etc.

[0193] Those skilled in the art will understand that the electronic device 700 may also include a power supply (such as a battery) for supplying power to various components. The power supply may be logically connected to the processor 710 through a power management system, thereby enabling functions such as managing charging, discharging, and power consumption through the power management system. Figure 8 The electronic device structure shown does not constitute a limitation on the electronic device. The electronic device may include more or fewer components than shown, or combine certain components, or have different component arrangements, which will not be elaborated here.

[0194] The user input unit 707 is used to obtain boundary constraint information between the design target and the architecture search space. The boundary constraint information includes process data and definition information of the architecture search space. The definition information is used to constrain the adjustment range of chip architecture parameters. Chip architecture parameters include the topology parameters of the storage cell modules of the storage subsystem in the chip and the size parameters of the computing array in the chip.

[0195] The processor 710 is used to determine the power consumption of memory cell modules with different topologies within the adjustment range based on process data, and to determine the topology parameters of the memory cell modules in the memory subsystem of the chip based on the power consumption of the memory cell modules with different topologies. The topology parameters include the number, capacity and physical splicing topology of the memory cell modules.

[0196] The processor 710 is also used to predict the power consumption of computing arrays with different configurations within the adjustment range through a machine learning agent model, and to determine the size parameters of the computing arrays in the chip based on the power consumption of the computing arrays with different configurations, wherein the computing arrays with different configurations are computing arrays with at least one different size parameter, computing accuracy and toggle rate.

[0197] In this embodiment, the chip architecture parameters are no longer determined based on a single, fixed hardware description, but rather on variable boundary constraint information, namely, the topology parameters of the memory cell modules in the chip's memory subsystem and the size parameters of the computing array in the chip. The determination of the topology parameters of the memory cell modules and the size parameters of the computing array are independent of each other. Specifically, based on the energy consumption of different topology memory cell modules determined by process data, the topology parameters of the memory cell modules are determined from multiple levels, including the number, capacity, and physical assembly topology of the memory cell modules. This achieves refined evaluation and feedback of the topology parameters of the memory cell modules, reducing chip energy consumption. Furthermore, the topology parameters determined based on process data can meet the constraints of the backend physical implementation, eliminating the need for repeated design and improving the accuracy and efficiency of topology parameter determination. Moreover, based on the power consumption prediction of computing arrays with at least one different configuration among size parameters, computational accuracy, and toggle rate using a machine learning surrogate model, the size parameters of the computing array are determined, which can improve the accuracy and efficiency of size parameter determination while reducing chip power consumption. In this way, during the determination of chip architecture parameters, automatic optimization of chip architecture parameters is achieved from two levels: the topology parameters of the storage unit module and the size parameters of the computing array. This improves the accuracy and efficiency of determining chip architecture parameters, and facilitates the accurate location and elimination of energy bottlenecks in the data transport path, thereby reducing chip power consumption.

[0198] Optionally, the processor 710 is specifically used to: construct storage cell modules with different topologies within an adjustment range, wherein each storage cell module with a different topology meets the capacity, bandwidth, and latency requirements defined by the information constraints; calculate the read / write power consumption, clock tree power consumption, and static leakage power consumption of each storage cell module with different topologies based on process data; and determine the sum of the read / write power consumption, clock tree power consumption, and static leakage power consumption of each storage cell module with different topologies as the power consumption of each storage cell module with different topologies.

[0199] The embodiments provided in this application construct memory cell modules with different topologies within an adjustment range. Each memory cell module with a different topology meets the capacity, bandwidth, and latency requirements defined by the information constraints. Based on process data, the read / write power consumption, clock tree power consumption, and static leakage power consumption of each memory cell module with a different topology are calculated. The sum of the read / write power consumption, clock tree power consumption, and static leakage power consumption of each memory cell module with a different topology is determined as the power consumption of each memory cell module with a different topology. In this way, in the early stages of chip architecture design, accurate and comprehensive power consumption analysis is achieved based on process data, which facilitates the reduction of chip power consumption and can also reduce the iteration costs in the later stages of chip physical implementation.

[0200] Optionally, the processor 710 is specifically used to: establish a multi-objective optimization model, wherein the optimization objectives of the multi-objective optimization model include at least the energy consumption index, area index, and latency index of the storage unit module; use the multi-objective optimization model to screen storage unit modules with different topologies based on the Pareto optimal screening mechanism to obtain the storage unit module with the lowest energy consumption target topology; and determine the number, capacity, and physical splicing topology of the storage unit modules in the target topology as the topology parameters of the storage unit module.

[0201] The embodiments provided in this application establish a multi-objective optimization model. The optimization objectives of the multi-objective optimization model include at least the energy consumption, area, and latency indicators of the storage unit module. Using the multi-objective optimization model, a Pareto optimal selection mechanism is used to screen storage unit modules with different topologies to obtain the storage unit module with the lowest energy consumption target topology. The number, capacity, and physical assembly topology of the storage unit modules in the target topology are determined as the topology parameters of the storage unit module. Thus, based on the Pareto optimal selection mechanism, the optimal topology of the storage unit module is selected from multiple levels, including energy consumption, area, and latency performance, ensuring that the determined topology parameters perfectly meet the design requirements. This allows the designed chip to balance energy consumption, area cost, and latency performance.

[0202] Optionally, the processor 710 is specifically used to: construct computing arrays with different size parameters, computational precision, and flip-flop rates within an adjustment range; utilize a machine learning proxy model to output the power consumption of computing arrays with different configurations based on the mapping relationship between the size parameters, computational precision, flip-flop rate, and power consumption of the computing arrays; determine the energy efficiency of computing arrays with different configurations based on the power consumption of the computing arrays with different configurations, and determine the target computing array with the highest energy efficiency; and determine the size of the target computing array as the size parameter of the computing array in the chip.

[0203] The embodiments provided in this application construct computing arrays with different size parameters, computational precision, and flip-flop rates within an adjustment range. Using a machine learning proxy model, based on the mapping relationship between the size parameters, computational precision, flip-flop rate, and power consumption of the computing arrays, the power consumption of different configurations of the computing arrays is output. Based on the power consumption of the different configurations of the computing arrays, the energy efficiency of the different configurations is determined, and the target computing array with the highest energy efficiency is identified. The size of the target computing array is determined as the size parameter of the computing array in the chip. In this way, by using the power consumption of different configurations of computing arrays predicted by the machine learning proxy model, the energy efficiency of different configurations of computing arrays is evaluated, and the size parameters of the computing array in the chip are determined. This facilitates reducing chip power consumption and improving chip energy efficiency. Furthermore, the machine learning proxy model can complete the power consumption prediction of a configuration within milliseconds, enabling extremely fast design space exploration and improving the efficiency of determining the size parameters of the computing array.

[0204] Optionally, the processor 710 is also configured to: acquire a training dataset, the training dataset including the size parameters, computational accuracy, flip rate and simulation power consumption of computing arrays with different configurations; divide the training dataset into a training sample set and a validation sample set; iteratively train the machine learning agent model based on the training sample set and the target loss function; determine the prediction error of the trained machine learning agent model based on the validation sample set, and determine that the trained machine learning agent model has passed validation if the prediction error is less than the error threshold.

[0205] The embodiments provided in this application, before predicting the power consumption of computing arrays with different configurations within an adjustment range using a machine learning proxy model, acquire a training dataset. The training dataset includes the size parameters, computational accuracy, flip-flop rate, and simulated power consumption of computing arrays with different configurations. The training dataset is divided into a training sample set and a validation sample set. The machine learning proxy model is iteratively trained based on the training sample set and the target loss function. The prediction error of the trained machine learning proxy model is determined based on the validation sample set, and if the prediction error is less than an error threshold, the trained machine learning proxy model is deemed to have passed validation. In this way, training the machine learning proxy model based on real simulation data of computing arrays with different configurations enables the machine learning proxy model to accurately learn the mapping relationship between the size parameters, computational accuracy, flip-flop rate, and power consumption of the computing array, thereby improving the accuracy of the machine learning proxy model in predicting the power consumption of computing arrays with different configurations and improving the accuracy of the subsequently determined size parameters of the computing array.

[0206] It should be understood that, in this embodiment, the input unit 704 may include a graphics processing unit (GPU) 7041 and a microphone 7042. The GPU 7041 processes image data of still images or videos obtained by an image capture device (such as a camera) in video capture mode or image capture mode. The display unit 706 may include a display panel 7061, which may be configured in the form of a liquid crystal display, an organic light-emitting diode, or the like. The user input unit 707 includes at least one of a touch panel 7071 and other input devices 7072. The touch panel 7071 is also called a touch screen. The touch panel 7071 may include a touch detection device and a touch controller. Other input devices 7072 may include, but are not limited to, a physical keyboard, function keys (such as volume control buttons, power buttons, etc.), a trackball, a mouse, and a joystick, which will not be described in detail here.

[0207] The memory 709 can be used to store software programs and various data. The memory 709 may primarily include a first storage area for storing programs or instructions and a second storage area for storing data. The first storage area may store the operating system, application programs or instructions required for at least one function (such as sound playback, image playback, etc.). Furthermore, the memory 709 may include volatile memory or non-volatile memory, or both. The non-volatile memory may be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), or flash memory. Volatile memory can be random access memory (RAM), static random access memory (SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (SDRAM), double data rate synchronous dynamic random access memory (DDRSDRAM), enhanced synchronous dynamic random access memory (ESDRAM), synchronous link dynamic random access memory (SLDRAM), and direct memory bus RAM (DRRAM). The memory 709 in the embodiments of this application includes, but is not limited to, these and any other suitable types of memory.

[0208] Processor 710 may include one or more processing units; optionally, processor 710 integrates an application processor and a modem processor, wherein the application processor mainly handles operations involving the operating system, user interface, and applications, and the modem processor mainly handles wireless communication signals, such as a baseband processor. It is understood that the aforementioned modem processor may also not be integrated into processor 710.

[0209] This application also provides a readable storage medium storing a program or instructions. When the program or instructions are executed by a processor, they implement the various processes of the above-described method for determining chip architecture parameters and achieve the same technical effect. To avoid repetition, they will not be described again here.

[0210] The processor is the processor in the electronic device described in the above embodiments. The readable storage medium includes computer-readable storage media, such as computer read-only memory (ROM), random access memory (RAM), magnetic disk, or optical disk.

[0211] This application also provides a chip, including a processor and a communication interface, with the communication interface and processor coupled. The processor is used to run programs or instructions to implement the various processes of the above-described chip architecture parameter determination method embodiment, and can achieve the same technical effect. To avoid repetition, it will not be described again here.

[0212] It should be understood that the chip mentioned in the embodiments of this application may also be referred to as a system-on-a-chip, system chip, chip system, or system-on-a-chip, etc.

[0213] This application provides a computer program product, which is stored in a storage medium and executed by at least one processor to implement the various processes of the chip architecture parameter determination method embodiment described above, and can achieve the same technical effect. To avoid repetition, it will not be described again here.

[0214] It should be noted that, in this document, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes that element. Furthermore, it should be noted that the scope of the methods and apparatuses in the embodiments of this application is not limited to performing functions in the order shown or discussed, but may also include performing functions substantially simultaneously or in the reverse order, depending on the functions involved. For example, the described methods may be performed in a different order than described, and various steps may be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.

[0215] Through the above description of the embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus necessary general-purpose hardware platforms. Of course, they can also be implemented by hardware, but in many cases the former is a better implementation method. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, can be embodied in the form of a computer software product. The computer software product is stored in a storage medium (such as ROM / RAM, magnetic disk, optical disk) and includes several instructions to cause a terminal (which may be a mobile phone, computer, server, or network device, etc.) to execute the methods of the various embodiments of this application.

[0216] The embodiments of this application have been described above with reference to the accompanying drawings. However, this application is not limited to the specific embodiments described above. The specific embodiments described above are merely illustrative and not restrictive. Those skilled in the art can make many other forms under the guidance of this application without departing from the spirit and scope of the claims, and all of these forms are within the protection scope of this application.

Claims

1. A chip architecture, characterized in that, include: System bus interface; A data access controller is connected to the system bus interface; A storage subsystem, connected to the data access controller, the storage subsystem comprising multiple storage unit modules; The topology parameters of the plurality of storage unit modules are determined based on the energy consumption of the storage unit modules with different topologies. The topology parameters include at least one of the number, capacity and physical splicing topology of the storage unit modules. The energy consumption of the storage unit modules is determined based on process data. A computing engine, connected to the storage subsystem, the computing engine including a computing array; The size parameter of the computing array is determined based on the power consumption of the computing array in different configurations. The computing arrays in different configurations are computing arrays that have at least one different value among size parameter, computing accuracy and flip-flop rate. The power consumption of the computing array is predicted by a machine learning proxy model. The global control unit is connected to both the system bus interface and the storage subsystem. The vector processing unit is connected to both the computing engine and the storage subsystem.

2. A method for determining chip architecture parameters, characterized in that, include: Obtain boundary constraint information between the design goal and the architecture search space. The boundary constraint information includes process data and the definition information of the architecture search space. The definition information is used to constrain the adjustment range of the chip architecture parameters. The chip architecture parameters include: the topology parameters of the memory cell modules of the memory subsystem in the chip and the size parameters of the computing array in the chip. Based on the process data, the energy consumption of the memory cell modules with different topologies within the adjustment range is determined, and based on the energy consumption of the memory cell modules with different topologies, the topology parameters of the memory cell modules in the chip's memory subsystem are determined. The topology parameters include at least one of the number, capacity, and physical splicing topology of the memory cell modules. The power consumption of the computing array with different configurations within the adjustment range is predicted by a machine learning proxy model, and the size parameters of the computing array in the chip are determined based on the power consumption of the computing array with different configurations. The computing array with different configurations is a computing array with at least one different value among size parameters, computing accuracy and toggle rate.

3. The method for determining chip architecture parameters according to claim 2, characterized in that, Determining the energy consumption of the storage unit modules with different topologies within the adjustment range based on the process data includes: Within the adjustment range, storage unit modules with different topologies are constructed, and each storage unit module with a different topology satisfies the capacity, bandwidth, and latency requirements of the defined information constraints. Based on the process data, calculate the read / write power consumption, clock tree power consumption, and static leakage power consumption of the memory cell module for each topology. The sum of the read / write power consumption, clock tree power consumption, and static leakage power consumption of the storage unit module for each topology is determined as the power consumption of the storage unit module for each topology.

4. The method for determining chip architecture parameters according to claim 2, characterized in that, The step of determining the topology parameters of the storage unit modules in the chip's storage subsystem based on the energy consumption of the storage unit modules with different topologies includes: A multi-objective optimization model is established, wherein the optimization objectives of the multi-objective optimization model include at least the energy consumption index, area index, and latency index of the storage unit module; Using the multi-objective optimization model, the storage unit modules with different topologies are screened based on the Pareto optimal screening mechanism to obtain the storage unit module with the target topology that has the lowest energy consumption. The number, capacity, and physical splicing topology of the storage unit modules in the target topology are determined as the topology parameters of the storage unit modules.

5. The method for determining chip architecture parameters according to claim 2, characterized in that, The step of predicting the power consumption of the computing array with different configurations within the adjustment range using a machine learning proxy model, and determining the size parameters of the computing array in the chip based on the power consumption of the computing array with different configurations, includes: Within the adjustment range, construct the computational array with different size parameters, computational accuracy, and flip rate; Using the machine learning proxy model, the power consumption of the computing array with different configurations is output according to the mapping relationship between the size parameters, computing accuracy, flip rate and power consumption of the computing array; Based on the power consumption of the computing arrays with different configurations, determine the energy efficiency of the computing arrays with different configurations, and determine the target computing array with the highest energy efficiency. The size of the target computing array is determined as the size parameter of the computing array in the chip.

6. The method for determining chip architecture parameters according to any one of claims 2 to 5, characterized in that, Before predicting the power consumption of the computing array with different configurations within the adjustment range using a machine learning proxy model, the method for determining the chip architecture parameters further includes: Obtain a training dataset, which includes the size parameters, computational accuracy, flip rate, and simulation power consumption of the computing array in different configurations; The training dataset is divided into a training sample set and a validation sample set; The machine learning agent model is iteratively trained based on the training sample set and the target loss function; The prediction error of the trained machine learning agent model is determined based on the validation sample set, and if the prediction error is less than the error threshold, the trained machine learning agent model is determined to have passed the validation.

7. A device for determining chip architecture parameters, characterized in that, include: The acquisition unit is used to acquire boundary constraint information between the design target and the architecture search space. The boundary constraint information includes process data and definition information of the architecture search space. The definition information is used to constrain the adjustment range of chip architecture parameters. The chip architecture parameters include: topology parameters of the storage unit modules of the storage subsystem in the chip and size parameters of the computing array in the chip. The processing unit is configured to determine the energy consumption of the memory cell modules with different topologies within the adjustment range based on the process data, and to determine the topology parameters of the memory cell modules in the memory subsystem of the chip based on the energy consumption of the memory cell modules with different topologies. The topology parameters include the number, capacity and physical splicing topology of the memory cell modules. The processing unit is further configured to predict the power consumption of the computing arrays with different configurations within the adjustment range using a machine learning proxy model, and to determine the size parameters of the computing arrays in the chip based on the power consumption of the computing arrays with different configurations, wherein the computing arrays with different configurations are computing arrays with at least one different value among size parameters, computing accuracy, and toggle rate.

8. An electronic device, characterized in that, include: The chip architecture as described in claim 1.

9. An electronic device, characterized in that, It includes a processor and a memory, the memory storing a program or instructions that can run on the processor, the program or instructions being executed by the processor to implement the steps of the method for determining chip architecture parameters as described in any one of claims 2 to 6.

10. A readable storage medium, characterized in that, The readable storage medium stores a program or instructions that, when executed by a processor, implement the steps of the method for determining chip architecture parameters as described in any one of claims 2 to 6.