Cache parameter determination method and apparatus, electronic device, and storage medium
By constructing an address dataset on a simulation model and using a clustering algorithm to determine cache parameters, the problem of relying on human experience and long-term simulation verification in cache design is solved, achieving efficient and accurate design of cache parameters, improving design efficiency and reducing costs.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SUZHOU YIZHU INTELLIGENT TECH CO LTD
- Filing Date
- 2026-04-07
- Publication Date
- 2026-06-19
AI Technical Summary
In existing technologies, the determination of cache parameters relies on human experience, which leads to a high degree of subjectivity in the design process, making it difficult to find the optimal solution in a massive design space. Furthermore, the simulation and verification cycle is long and inefficient, increasing R&D costs.
By obtaining the memory access request sequence generated by the target application on the simulation model, an address dataset is constructed, and clustering algorithms are used to process the address points to determine the total cache capacity and cache line capacity, reducing the reliance on manual intervention and simulation verification.
It improves the efficiency and verification cycle of cache parameter design, enables precise determination of cache size and cache line size, and reduces the resource consumption and cost of chip architecture design.
Smart Images

Figure CN122240532A_ABST
Abstract
Description
Technical Field
[0001] This disclosure relates to the field of artificial intelligence technology, and in particular to a method, apparatus, electronic device and storage medium for determining cache parameters. Background Technology
[0002] In modern computer architecture, cache serves as a crucial buffer component connecting the Central Processing Unit (CPU) and main memory. Its core function is to alleviate the significant gap between CPU processing speed and main memory access speed. By temporarily storing frequently accessed data and instructions from the CPU, it reduces the number of CPU accesses to main memory, thereby significantly improving the overall operating efficiency, data throughput, and response speed of the computer system. It is an indispensable core component in modern processors, embedded systems, servers, and other computing devices.
[0003] In cache architecture design, cache size and cache block size are two core parameters. These two parameters have complex relationships affecting hit rate, access latency, hardware cost, and spatial locality utilization. Improper settings can severely restrict system performance. Specifically, the choice of cache size directly relates to chip area, power consumption, and access latency, requiring a trade-off between physical cost and performance gains. The cache block size, on the other hand, subtly influences spatial locality utilization efficiency, cache miss rate, and memory bandwidth usage; both excessively large and small sizes result in varying performance losses. More importantly, these two parameters are strongly coupled, making optimal configuration highly dependent on the workload characteristics of the target application.
[0004] In related technologies, determining cache parameters typically relies on the architect's personal experience, supplemented by repeated trial and error adjustments using simulators to run test programs. This traditional approach has significant drawbacks: firstly, the design process is highly subjective, overly dependent on human experience, and struggles to find the optimal solution within a vast design space; secondly, simulation verification is time-consuming and inefficient, increasing development costs. Therefore, how to scientifically and efficiently set cache parameters remains a pressing technical challenge for the industry. Summary of the Invention
[0005] This disclosure provides a method, apparatus, electronic device, and storage medium for determining cache parameters, which can effectively reduce the occupation of video memory resources by determining cache parameters, while significantly improving semantic preservation capabilities.
[0006] According to one aspect of this disclosure, a method for determining cache parameters is provided, comprising:
[0007] Obtain a sequence of memory access requests generated by multiple target applications running on a simulation model, wherein each memory access request includes a request number and the memory address accessed;
[0008] An address dataset is constructed based on the memory access request sequence. The address dataset includes multiple address points, and each address point includes a request sequence number and a corresponding memory address.
[0009] Perform a first clustering process on multiple address points in the address dataset to obtain at least one first cluster.
[0010] The total cache capacity is determined based on the memory address distribution of each address point in the first cluster.
[0011] Get the preset number of cache groups and cache paths;
[0012] Based on the number of cache groups and the number of cache paths, a second clustering process is performed on multiple address points in the address set to obtain multiple second clusters;
[0013] The target cache line capacity is determined based on the memory address distribution of each address point in each of the second clusters.
[0014] Optionally, the step of performing a first clustering process on multiple address points in the address dataset to obtain at least one first cluster includes:
[0015] Each address point in the address dataset is initialized into an independent cluster to obtain an initial cluster set;
[0016] Calculate the distance between any two clusters in the current cluster set, and merge the two clusters with the smallest distance into a new cluster to obtain the updated cluster set;
[0017] Repeat the above steps until the first preset termination condition is reached, and the cluster that reaches the first preset termination condition is determined as the first cluster.
[0018] Optionally, determining the cluster that reaches the first preset termination condition as the first cluster includes:
[0019] Get the preset cache level;
[0020] When the first preset termination condition is met, the clusters in the current cluster set are truncated and divided according to the number of cache levels to obtain multiple target clusters with the same number of cache levels;
[0021] Each target cluster is determined as the first cluster of the corresponding level cache.
[0022] Optionally, the first preset termination condition includes any of the following:
[0023] All address points in the current cluster set have been merged into one cluster;
[0024] Alternatively, the number of clusters in the current cluster set is equal to the preset first target number.
[0025] Optionally, calculating the distance between any two clusters in the current cluster set includes:
[0026] Determine the centroid of each cluster in the current cluster set and the number of address points contained in each cluster;
[0027] Based on the number of centroids and address points of the two clusters, calculate the intra-cluster variance increment of the new cluster formed after merging the two clusters;
[0028] The intra-cluster variance increment is used as the distance value between the two clusters.
[0029] Optionally, determining the total cache capacity based on the memory address distribution of each address point in the first cluster includes:
[0030] For each of the first clusters, the corresponding memory addresses are obtained from the address dataset according to the request sequence number of each address point in the first cluster, forming an address subset of the first cluster;
[0031] Determine the minimum and maximum memory addresses for each subset of addresses;
[0032] The cache capacity of each first cluster is determined based on the minimum memory address and the maximum memory address of each of the address subsets;
[0033] The total cache capacity is determined by summing the cache capacities of each first cluster.
[0034] Optionally, determining the total cache capacity based on the memory address distribution of each address point in the first cluster includes:
[0035] For each of the first clusters, the memory address range corresponding to the first cluster is determined based on the memory addresses of all address points contained in the first cluster;
[0036] Calculate the capacity of the corresponding cache level based on the memory address range corresponding to each of the first clusters;
[0037] The total cache capacity is determined by summing the calculated capacities of each cache level.
[0038] Optionally, the step of performing a second clustering process on multiple address points in the address set based on the number of cache groups and the number of cache paths to obtain multiple second clusters includes:
[0039] The product of the number of cache groups and the number of cache paths is determined as the second target quantity;
[0040] Randomly select the second target number of address points from the address dataset, and use each selected address point as an initial cluster, and determine the initial centroid of each initial cluster;
[0041] Calculate the distance between each remaining address point in the address dataset and each initial centroid, and assign each remaining address point to the initial cluster containing the nearest initial centroid, thus obtaining the second target number of first-level clusters;
[0042] Calculate the updated centroid of each first-level cluster, and calculate the distance between each remaining address point in the address dataset and each updated centroid. Reassign each remaining address point to the cluster containing the nearest updated centroid to obtain the second target number of second-level clusters.
[0043] Treat the second-level cluster as the first-level cluster, and repeat the above steps until the second preset termination condition is met.
[0044] The K clusters obtained when the second preset termination condition is met are determined as the second clustering cluster; K is an integer greater than or equal to 1.
[0045] Optionally, the second preset termination condition includes:
[0046] The centroids of each cluster no longer change, or the clusters to which all address points in the address dataset belong no longer change, or the preset maximum number of iterations is reached.
[0047] Optionally, determining the target cache line capacity based on the memory address distribution of each address point in each of the second clusters includes:
[0048] For each of the second clusters, the memory address range corresponding to the second cluster is determined based on the memory addresses of all address points contained in the second cluster.
[0049] The memory address range corresponding to each of the second clusters is determined as the cache line capacity represented by the second cluster;
[0050] Calculate the arithmetic mean of the cache line sizes of all second clusters, and use this as the target cache line size.
[0051] According to one aspect of this disclosure, a cache parameter determination apparatus is provided, the apparatus comprising:
[0052] The first acquisition module is used to acquire a sequence of memory access requests generated by multiple target applications running on the simulation model, wherein each memory access request includes a request number and the memory address accessed.
[0053] The construction module is used to construct an address dataset based on the memory access request sequence. The address dataset includes multiple address points, and each address point includes a request sequence number and a corresponding memory address.
[0054] The first clustering module is used to perform a first clustering process on multiple address points in the address dataset to obtain at least one first cluster.
[0055] The first determining module is used to determine the total cache capacity based on the memory address distribution of each address point in the first cluster.
[0056] The second acquisition module is used to acquire the preset number of cache groups and cache paths;
[0057] The second clustering module is used to perform a second clustering process on multiple address points in the address set according to the number of cache groups and the number of cache paths, so as to obtain multiple second clusters;
[0058] The second determining module is used to determine the target cache line capacity based on the memory address distribution of each address point in each of the second clusters.
[0059] According to one aspect of this disclosure, an electronic device is provided, the electronic device including a memory, a processor, a program stored in the memory and executable on the processor, and a data bus for implementing connection communication between the processor and the memory, wherein the program is executed by the processor to implement the cache parameter determination method described above.
[0060] According to one aspect of this disclosure, a computer-readable storage medium is provided that stores one or more programs, which can be executed by one or more processors to implement the cache parameter determination method described above.
[0061] This embodiment of the disclosure obtains a sequence of memory access requests generated by multiple target applications running on a simulation model, wherein each memory access request includes a request sequence number and the memory address accessed; constructs an address dataset based on the memory access request sequence, the address dataset including multiple address points, each address point including a request sequence number and a corresponding memory address; performs a first clustering process on the multiple address points in the address dataset to obtain at least one first cluster; determines the total cache capacity based on the memory address distribution of each address point in the first cluster; obtains a preset number of cache groups and cache paths; performs a second clustering process on the multiple address points in the address dataset based on the number of cache groups and cache paths to obtain multiple second clusters; determines the target cache line capacity based on the memory address distribution of each address point in each second cluster. Thus, by performing multiple clustering processes on the address points, the total cache size and cache line size can be accurately determined, greatly improving the efficiency and verification cycle of cache design from scratch in chip architecture.
[0062] Other features and advantages of this disclosure will be set forth in the following description and will be apparent in part from the description or may be learned by practicing the disclosure. The objectives and other advantages of this disclosure may be realized and obtained by means of the structures particularly pointed out in the description, claims and drawings. Attached Figure Description
[0063] The accompanying drawings are provided to further understand the technical solutions of this disclosure and constitute a part of the specification. They are used together with the embodiments of this disclosure to explain the technical solutions of this disclosure and do not constitute a limitation on the technical solutions of this disclosure.
[0064] Figure 1 This is a system architecture diagram of the cache parameter determination method applied in the embodiments of this disclosure;
[0065] Figure 2 This is a main flowchart of a cache parameter determination method according to an embodiment of the present disclosure;
[0066] Figure 3 This is a schematic diagram of step S203 of an embodiment of the present disclosure;
[0067] Figure 4 This is an example diagram of a first clustering disclosed herein;
[0068] Figure 5 This is a schematic diagram of step S205 of an embodiment of the present disclosure;
[0069] Figure 6 This is a schematic diagram of the structure of a cache parameter determination device according to an embodiment of the present disclosure;
[0070] Figure 7This is a schematic diagram of the structure of an electronic device proposed in one embodiment of the present disclosure. Detailed Implementation
[0071] To make the objectives, technical solutions, and advantages of this disclosure clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and are not intended to limit the scope of this disclosure.
[0072] Cache: In computer science, a cache is a high-speed data storage layer used to store temporary data so that future requests for that data can be processed more quickly. Caches can be divided into multiple levels; for example, CPU caches can be divided into three levels: L1, L2, and L3.
[0073] Cache Size: Also known as cache capacity, it is a cache parameter that represents the capacity of the temporary storage area. For example, the CPU cache is divided into three levels of cache. Once the cache capacity of each level is determined, the sum of the cache capacities of all levels is the total CPU cache capacity.
[0074] Number of cache sets (set_num): The number of cache sets is the number of sets into which the cache is divided. It is usually determined by the index of the memory address and belongs to the one-time lookup range of the cache.
[0075] Cache way number (way_num): The cache way number represents the number of cache lines that can be stored in each cache group, also known as the associativity; common ones are 2-way, 4-way, 8-way, 16-way, etc.
[0076] Cache line size (also known as cache line size or cache block size): A cache line is the smallest unit of data exchange between the CPU cache and memory, and it is also the basic unit in the cache storage structure. Physically, a cache line is represented by a slot of fixed size in the cache; logically, a cache line represents a contiguous block of data loaded from memory into the cache at once.
[0077] In related technologies, the determination of cache parameters (including total cache size and cache line size) usually relies on the architect's personal experience, supplemented by repeated trial and error adjustments using simulators to run test programs. This traditional approach has significant drawbacks: First, the design process is highly subjective, overly reliant on human experience, and makes it difficult to find the optimal solution in a massive design space; second, the simulation verification cycle is long and inefficient, increasing development costs.
[0078] Based on this, this disclosure proposes a method, apparatus, electronic device, and storage medium for determining cache parameters. By using a simulation simulator to generate requests for each application, and then using an address dataset composed of the sequence number of each application request and the memory address it carries, different clustering processes are performed on each address point in the address dataset based on different clustering methods to determine the total cache size and cache line size. This eliminates the need for repeated manual verification and greatly improves the design efficiency of the cache.
[0079] System architecture description applied in the embodiments of this disclosure
[0080] Figure 1 This is a system architecture diagram of the cache parameter determination method used in this embodiment of the disclosure, which includes: object terminal 110, Internet 120, gateway 130, and server 140.
[0081] Object terminal 110 is a device used by an object to input query information. It includes various forms such as desktop computers, laptops, PDAs (personal digital assistants), mobile phones, in-vehicle terminals, home theater terminals, and dedicated terminals. Furthermore, it can be a single device or a collection of multiple devices. For example, multiple devices can be connected via a local area network, sharing a single display device to work collaboratively, forming a single terminal. Object terminal 110 can also communicate with the Internet 120 via wired or wireless means to exchange data.
[0082] Gateway 130, also known as an internetwork connector or protocol converter, is a computer system or device that enables network interconnection at the transport layer and acts as a translator. It bridges the gap between two systems using different communication protocols, data formats, languages, or even completely different architectures. Gateway 130 also provides filtering and security functions. Messages sent from target terminal 110 to server 140 are forwarded to the corresponding server 140 via gateway 130. Messages sent from server 140 to target terminal 110 are also forwarded to the corresponding target terminal 110 via gateway 130.
[0083] Server 140 refers to a computer system that provides services to target terminal 110. Compared to target terminal 110, server 140 has higher requirements in terms of stability, security, and performance. Server 140 can be a single high-performance computer in a network platform, a cluster of multiple high-performance computers, a portion of a single high-performance computer (e.g., a virtual machine), or a combination of portions of multiple high-performance computers (e.g., virtual machines). Server 140 can also communicate with the Internet 120 via wired or wireless means to exchange data.
[0084] Overall Implementation of the Cache Parameter Determination Method of the Embodiments of this Disclosure
[0085] This disclosure provides a method for determining cache parameters, applied to a cache parameter determining device, with reference to... Figure 2 The method for determining the cache parameters includes:
[0086] In step S201, a sequence of memory access requests generated by multiple target applications running on the simulation model is obtained, wherein each memory access request includes a request number and the memory address accessed.
[0087] In step S203: an address dataset is constructed based on the memory access request sequence. The address dataset includes multiple address points, and each address point includes a request sequence number and a corresponding memory address.
[0088] In this embodiment, a fast-model (i.e., a simulation model without cache, or a simulation model with pseudo-cache, i.e., a simulation model without cache functionality) can be used to simulate and test a large number of applications that the designed chip is intended for, generating multiple application requests. Specifically, these requests are read / write requests targeting the cache. Request numbers for each access request can also be generated in chronological order. An address point can be constructed based on the request number and memory address of each access request, and an address dataset can be built based on the address points corresponding to multiple access request sequences. Of course, other simulation models can also be used to construct the address dataset; this is not a limitation.
[0089] Understandably, the address dataset is a two-dimensional dataset. Each data point in this dataset can be represented as (index, memory address), where the index (idx) is a number starting from 0. For example, for 6 requests, their corresponding indexes could be 0, 1, 2, 3, 4, and 5. Since the index of each data point in the address dataset can represent a temporal attribute, and the memory address can represent a spatial attribute, when the locality of read and write requests is good, a large number of requests can be considered to be closely related, or similar, in both time and space dimensions.
[0090] In this embodiment, a memory address can be specifically represented as a binary / hexadecimal number. From a hardware perspective, memory is divided into many storage units, each typically holding one byte (8 bits) and having a number, which is the memory address; for example, address 0x2000 can be represented as 0010 0000 0000 0000 (in hexadecimal). The information contained in the memory address differs depending on the system type. For example, in a paging system, the memory address typically includes the page number and offset within the page; in a segmented system, it includes the segment number and offset within the segment. Therefore, the form of the memory address in this solution can be chosen according to system design requirements and is not limited here. However, in this disclosure, the memory addresses requested above are all of the same type to ensure the accuracy of subsequent clustering results.
[0091] In step S205: a first clustering process is performed on multiple address points in the address dataset to obtain at least one first cluster.
[0092] In one feasible implementation, please refer to Figure 3 Step S205 can be specifically described as follows:
[0093] In step S301: each address point in the address dataset is initialized into an independent cluster to obtain an initial cluster set.
[0094] In this embodiment, the address dataset includes N address points, where N is an integer greater than 1. This address dataset can be divided into N clusters, forming the initial clusters. Please refer to [link / reference]. Figure 4 Specifically, the address dataset can include six address points, denoted as address point A, address point B, address point C, address point D, address point E, and address point F, which together form the initial clusters. Correspondingly, cluster A is (0, addr0), cluster B is (1, addr1), cluster C is (2, addr2), cluster D is (3, addr3), cluster E is (4, addr4), and cluster F is (5, addr5); 0, 1, 2, 3, 4, and 5 represent the request sequence numbers of the six access requests, and addr0, addr1, addr2, addr3, addr4, and addr5 represent the memory addresses of the six access requests.
[0095] In step S303: calculate the distance between any two clusters in the current cluster set, and merge the two clusters with the smallest distance into a new cluster to obtain the updated cluster set.
[0096] In one feasible implementation, step S303, calculating the distance between any two clusters in the current cluster set, may specifically include: determining the centroid of each cluster in the current cluster set and the number of address points contained in each cluster; calculating the intra-cluster variance increment of the new cluster formed after merging two clusters based on the centroids and the number of address points of the two clusters; and using the intra-cluster variance increment as the distance value between the two clusters. In this way, multiple cluster classes of uniform size and compact size can be generated.
[0097] In this embodiment, the intra-cluster variance increment between each cluster in the initial cluster can be determined and used as the distance value between each cluster. Optionally, the distance value between any two clusters is inversely proportional to their similarity; that is, the smaller the distance value between two clusters, the higher the similarity between the two clusters.
[0098] The intra-cluster variance increment D2 between any two clusters can be determined based on the following formula (1):
[0099] Formula (1)
[0100] in, The centroid of cluster i can be represented by the following coordinates: , ); The centroid of cluster j can be represented by the following coordinates: , ); i and j are both integers greater than or equal to 1; This represents the number of address points contained in a cluster i; This indicates the number of address points contained in another cluster j.
[0101] In this embodiment, the two clusters with the smallest distance can specifically refer to the cluster whose internal variance increment is the smallest after merging these two clusters. Continuing with the above example, please refer to... Figure 4 Assuming that the distance between clusters A and B is the smallest, and the distance between clusters E and F is the smallest, then clusters A and B are merged into cluster AB, and clusters E and F are merged into cluster EF. Thus, the initial 6 clusters are merged into 4 clusters, namely clusters AB, C, D, and EF, which constitute the updated cluster set.
[0102] In step S305: Repeat the above steps until the first preset termination condition is reached, and determine the cluster set that reaches the first preset termination condition as the first cluster.
[0103] In this embodiment, for the merged cluster, its centroid (x, y) can specifically be the mean of the centroids of the two clusters, which can be determined based on the following formula (2):
[0104] Formula (2)
[0105] For a cluster containing only one address point, its centroid is that address point.
[0106] Next, the increase in intra-cluster variance of any two merged clusters can be determined based on the above formula (1).
[0107] Continuing with the above example, please refer to... Figure 4 Assuming that in the second round of merging, the merged clusters to be calculated are cluster ABC, cluster ABD, cluster ABEF, cluster CD, cluster DEF, and cluster CDEF, among which the increase in intra-cluster variance of the merged cluster ABC of cluster AB and cluster C is the smallest, then in this round, cluster AB and cluster C will be merged, and the updated cluster set will include cluster ABC, cluster D, and cluster EF.
[0108] In one feasible implementation, the first preset termination condition includes any one of the following: all address points in the current cluster set have been merged into one cluster; or, the number of clusters in the current cluster set is equal to the preset first target number.
[0109] In this embodiment, termination is determined by whether the current round meets a first preset termination condition. Optionally, the first preset termination condition can be that all address points in the current cluster set have been merged into one cluster, that is, the merging of all clusters in the initial cluster is completed, forming a large cluster containing all clusters (e.g., ...). Figure 4 As shown in the example clusters ABCDEF, if hierarchical caching of memory is required, truncation is necessary. In a feasible implementation, step S305 determines the cluster set that reaches the first preset termination condition as the first clustering cluster. Specifically, this may include: obtaining a preset cache level; when the first preset termination condition is reached, truncating the clusters in the current cluster set according to the cache level to obtain multiple target clusters with the same number as the cache level; and determining each target cluster as the first clustering cluster of the corresponding cache level. This allows for precise hierarchical partitioning of the cache.
[0110] For example, assuming the cache has 3 levels, the initial clusters can be divided into 3 types by truncating and dividing the clusters ABCDEF in the final updated cluster set, namely clusters ABC, D and EF.
[0111] In other embodiments, the first preset termination condition may also be that the number of clusters in the current cluster set is equal to a preset first target number. Optionally, the first target number may specifically be the number of cache levels, for example, 3. The number of first clusters may be the same as the number of target clusters. Each cluster in the updated new cluster set may correspond to a sub-cluster, which also corresponds to the target cluster of the first-level cache. The above are just examples; the number of first clusters may vary depending on the required number of cache levels.
[0112] Subsequently, when the first preset termination condition is met, the cluster that meets the first preset termination condition can be determined as the first clustering cluster; otherwise, repeat the above step S303 until the first preset termination condition is met.
[0113] In step S207: the total cache capacity is determined based on the memory address distribution of each address point in the first cluster.
[0114] In one feasible implementation, the cache capacity of each level can be determined based on the address range of each first cluster, thereby determining the total cache capacity. Specifically, step S207, determining the total cache capacity based on the memory address distribution of each address point in the first cluster, may include: for each first cluster, determining the memory address range corresponding to the first cluster based on the memory addresses of all address points contained in the first cluster; calculating the capacity of the corresponding cache level based on the memory address range corresponding to each first cluster; and determining the total cache capacity by summing the calculated capacities of each cache level. Since each first cluster corresponds to a first-level cache, the corresponding cache capacity can be determined by determining the address range of each first cluster, thereby determining the cache capacity of each cache level, and the total cache capacity is determined based on the sum of the cache capacities of each cache level. Optionally, the address range of each first cluster can be determined based on the difference between the maximum and minimum memory addresses in the first cluster.
[0115] In another feasible implementation, the corresponding memory address can be obtained first based on the request sequence number of each address point in each first cluster, and then the cache capacity of each first cluster can be determined based on the maximum and minimum addresses in the address subsets corresponding to each first cluster. Specifically, determining the total cache capacity based on the memory address distribution of each address point in the first cluster in step S207 can include: for each first cluster, obtaining the corresponding memory address from the address dataset based on the request sequence number of each address point in the first cluster to form an address subset of the first cluster; determining the minimum and maximum memory addresses of each address subset; determining the cache capacity of each first cluster based on the minimum and maximum memory addresses of each address subset; and determining the sum of the cache capacities of all first clusters as the total cache capacity. This allows for accurate and reliable determination of the cache capacity of each level of cache and the total cache capacity.
[0116] In step S209: Obtain the preset number of cache groups and cache paths.
[0117] In step S211: Based on the number of cache groups and the number of cache paths, a second clustering process is performed on multiple address points in the address set to obtain multiple second clusters.
[0118] In a feasible implementation, step S211 involves performing a second clustering process on multiple address points in the address set based on the number of cache groups and the number of cache paths to obtain multiple second clusters. Specifically, this may include: determining the product of the number of cache groups and the number of cache paths as a second target number; randomly selecting the second target number of address points from the address dataset, using each selected address point as an initial cluster, and determining the initial centroid of each initial cluster; calculating the distance between each remaining address point in the address dataset and each initial centroid, and assigning each remaining address point to the initial cluster containing the nearest initial centroid, thus obtaining the second target number of first-level clusters; calculating the updated centroid of each first-level cluster, and calculating the distance between each remaining address point in the address dataset and each updated centroid, and reallocating each remaining address point to the cluster containing the nearest updated centroid, thus obtaining the second target number of second-level clusters; using the second-level clusters as first-level clusters, and repeating the above steps until a second preset termination condition is reached; and determining the K clusters obtained when the second preset termination condition is reached as the second clusters; where K is an integer greater than or equal to 1. This allows for accurate and efficient determination of cache line capacity.
[0119] In this embodiment, since the total cache capacity = number of cache groups × number of cache paths × cache line capacity, the product of the number of cache groups and the number of cache paths can be represented as K. Subsequently, by clustering the address dataset into K clusters, calculating the address range of the K clusters and averaging them, the cache line capacity can be obtained, thus achieving accurate and efficient determination of the cache line capacity.
[0120] In this embodiment, the second target quantity can be represented as K. K address points can be randomly selected from the address dataset to obtain K initial clusters.
[0121] In this embodiment, the remaining address points are the address points in the address dataset excluding the address points of the second target data. The Euclidean distance between each remaining address point and each initial centroid can be calculated to obtain the distance between the address point and each initial centroid. Optionally, the Euclidean distance between any two clusters can be determined based on the following formula (3):
[0122] = Formula (1)
[0123] In this embodiment, each remaining address point is assigned to the initial cluster with the smallest Euclidean distance, and then merged. This process continues until all remaining address points are assigned. Each merging of the initial cluster with remaining address points is equivalent to updating the initial cluster. When all remaining address points in the address data set are merged, part or all of the initial clusters are updated, resulting in K first-level clusters. Optionally, the first-level clusters may include one cluster, i.e., the initial cluster; or they may include multiple clusters, i.e., the initial cluster and other address points, depending on the calculated Euclidean distance.
[0124] In this embodiment, when the first-level cluster only includes the initial cluster, its updated centroid is equal to the initial centroid.
[0125] Since the centroids of the K clusters may change partially or completely, for each remaining address point, the Euclidean distance between the remaining address point and each updated centroid is calculated, thereby obtaining the distance value between the remaining address point and each updated centroid.
[0126] In this embodiment, each remaining address point is reallocated to the first-level cluster with the smallest Euclidean distance and then merged until all remaining address points are allocated. Each merging of the first-level cluster with the remaining address points is equivalent to updating the first-level cluster. When all remaining address points in the address data are merged, the update of some or all of the first-level clusters is completed, and k second-level clusters can be formed.
[0127] In this embodiment, termination is determined by whether the current round meets a second preset termination condition. Optionally, the first preset termination condition can be that the centroids of each cluster no longer change. In other embodiments, the second preset termination condition can also be that the clusters to which all address points in the address dataset belong no longer change. Alternatively, it can be that a preset maximum number of iterations has been reached.
[0128] Subsequently, when the second preset termination condition is met, the second-level cluster is determined as the second clustering cluster; otherwise, the above steps S505-S507 and the step of treating the second-level cluster as the first-level cluster are repeated until the second preset termination condition is met.
[0129] S213: Determine the target cache line capacity based on the memory address distribution of each address point in each of the second clusters.
[0130] In one feasible implementation, step S213, determining the target cache line capacity based on the memory address distribution of each address point in each of the second clusters, specifically includes: for each second cluster, determining the memory address range corresponding to the second cluster based on the memory addresses of all address points contained in the second cluster; determining the memory address range corresponding to each second cluster as the cache line capacity represented by the second cluster; and calculating the arithmetic mean of the cache line capacities of all second clusters as the target cache line capacity. Thus, the corresponding cache line capacity can be determined by determining the address range of each second cluster, and the target cache line capacity can be obtained by averaging the cache line capacities of each second cluster, achieving efficient and accurate determination of the cache line capacity. Optionally, the address range of each second cluster can be determined based on the difference between the maximum and minimum memory addresses in the second cluster.
[0131] Description of apparatus and devices according to embodiments of this disclosure
[0132] See Figure 6 This disclosure also provides a cache parameter determination device 600, comprising:
[0133] The first acquisition module 610 is used to acquire a sequence of memory access requests generated by multiple target applications running on the simulation model, wherein each memory access request includes a request number and the memory address accessed.
[0134] Construction module 620 is used to construct an address dataset based on the memory access request sequence. The address dataset includes multiple address points, and each address point includes a request sequence number and a corresponding memory address.
[0135] The first clustering module 630 is used to perform a first clustering process on multiple address points in the address dataset to obtain at least one first cluster.
[0136] The first determining module 640 is used to determine the total cache capacity based on the memory address distribution of each address point in the first cluster.
[0137] The second acquisition module 650 is used to acquire the preset number of cache groups and cache paths;
[0138] The second clustering module 660 is used to perform a second clustering process on multiple address points in the address set according to the number of cache groups and the number of cache paths, so as to obtain multiple second clusters;
[0139] The second determining module 670 is used to determine the target cache line capacity based on the memory address distribution of each address point in each of the second clusters.
[0140] In one feasible implementation, the first clustering module is used to initialize each address point in the address dataset with an independent cluster to obtain an initial cluster set;
[0141] Calculate the distance between any two clusters in the current cluster set, and merge the two clusters with the smallest distance into a new cluster to obtain the updated cluster set;
[0142] Repeat the above steps until the first preset termination condition is reached, and the cluster that reaches the first preset termination condition is determined as the first cluster.
[0143] In one feasible implementation, the first clustering module is used to obtain a preset number of cache levels;
[0144] When the first preset termination condition is met, the clusters in the current cluster set are truncated and divided according to the number of cache levels to obtain multiple target clusters with the same number of cache levels;
[0145] Each target cluster is determined as the first cluster of the corresponding level cache.
[0146] In one feasible implementation, the first preset termination condition includes any of the following:
[0147] All address points in the current cluster set have been merged into one cluster;
[0148] Alternatively, the number of clusters in the current cluster set is equal to the preset first target number.
[0149] In one feasible implementation, the first clustering module is used to determine the centroid of each cluster in the current cluster set and the number of address points contained in each cluster;
[0150] Based on the number of centroids and address points of the two clusters, calculate the intra-cluster variance increment of the new cluster formed after merging the two clusters;
[0151] The intra-cluster variance increment is used as the distance value between the two clusters.
[0152] In one feasible implementation, the first determining module is used to obtain the corresponding memory address from the address dataset for each first cluster according to the request sequence number of each address point in the first cluster, thereby forming an address subset of the first cluster;
[0153] Determine the minimum and maximum memory addresses for each subset of addresses;
[0154] The cache capacity of each first cluster is determined based on the minimum memory address and the maximum memory address of each of the address subsets;
[0155] The total cache capacity is determined by summing the cache capacities of each first cluster.
[0156] In one feasible implementation, the first determining module is configured to determine, for each first cluster, the memory address range corresponding to the first cluster based on the memory addresses of all address points contained in the first cluster;
[0157] Calculate the capacity of the corresponding cache level based on the memory address range corresponding to each of the first clusters;
[0158] The total cache capacity is determined by summing the calculated capacities of each cache level.
[0159] In one feasible implementation, the second clustering module is used to determine the product of the number of cache groups and the number of cache paths as the second target number;
[0160] Randomly select the second target number of address points from the address dataset, and use each selected address point as an initial cluster, and determine the initial centroid of each initial cluster;
[0161] Calculate the distance between each remaining address point in the address dataset and each initial centroid, and assign each remaining address point to the initial cluster containing the nearest initial centroid, thus obtaining the second target number of first-level clusters;
[0162] Calculate the updated centroid of each first-level cluster, and calculate the distance between each remaining address point in the address dataset and each updated centroid. Reassign each remaining address point to the cluster containing the nearest updated centroid to obtain the second target number of second-level clusters.
[0163] Treat the second-level cluster as the first-level cluster, and repeat the above steps until the second preset termination condition is met.
[0164] The K clusters obtained when the second preset termination condition is met are determined as the second clustering cluster; K is an integer greater than or equal to 1.
[0165] In one feasible implementation, the second preset termination condition includes:
[0166] The centroids of each cluster no longer change, or the clusters to which all address points in the address dataset belong no longer change, or the preset maximum number of iterations is reached.
[0167] In one feasible implementation, the second determining module is used to determine the memory address range corresponding to the second cluster for each second cluster based on the memory addresses of all address points contained in the second cluster;
[0168] The memory address range corresponding to each of the second clusters is determined as the cache line capacity represented by the second cluster;
[0169] Calculate the arithmetic mean of the cache line sizes of all second clusters, and use this as the target cache line size.
[0170] This disclosure also provides an electronic device including a memory, a processor, a program stored in the memory and executable on the processor, and a data bus for implementing communication between the processor and the memory, wherein the program is executed by the processor to implement the above-described cache parameter determination method.
[0171] The following is combined with Figure 7 The hardware structure of the electronic device is described in detail. The electronic device includes: a processor 710, a memory 720, an input / output interface 730, a communication interface 740, and a bus 750.
[0172] The processor 710 can be implemented using a general-purpose central processing unit (CPU), microprocessor, application-specific integrated circuit (ASIC), or one or more integrated circuits, and is used to execute relevant programs to implement the technical solutions provided in the embodiments of this disclosure.
[0173] The memory 720 can be implemented as a read-only memory (ROM), a static storage device, a dynamic storage device, or a random access memory (RAM). The memory 720 can store the operating system and other applications. When the technical solutions provided in the embodiments of this specification are implemented through software or firmware, the relevant program code is stored in the memory 720 and is called and executed by the processor 710 using the cache parameter determination method of the embodiments of this disclosure.
[0174] The input / output interface 730 is used to implement information input and output;
[0175] The communication interface 740 is used to enable communication and interaction between this device and other devices. Communication can be achieved via wired means (e.g., USB, Ethernet cable) or wireless means (e.g., mobile network, Wi-Fi, Bluetooth).
[0176] Bus 750 transmits information between various components of the device (e.g., processor 710, memory 720, input / output interface 730, and communication interface 740);
[0177] The processor 710, memory 720, input / output interface 730 and communication interface 740 are connected to each other within the device via bus 750.
[0178] This application also provides a computer-readable storage medium storing one or more programs that can be executed by one or more processors to implement the above-described cache parameter determination method, which will not be elaborated here.
[0179] The terms “first,” “second,” “third,” “fourth,” etc. (if present) in this disclosure and the foregoing drawings are used to distinguish similar objects and are not necessarily used to describe a particular order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of this disclosure described herein can be implemented, for example, in orders other than those illustrated or described herein. Furthermore, the terms “comprising” and “including,” and any variations thereof, are intended to cover non-exclusive inclusion; for example, a process, method, system, product, or apparatus that includes a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatuses.
[0180] It should be understood that in this disclosure, "at least one item" means one or more, and "more than one" means two or more. "And / or" is used to describe the relationship between related objects, indicating that three relationships can exist. For example, "A and / or B" can represent three cases: only A exists, only B exists, and both A and B exist simultaneously, where A and B can be singular or plural. The character " / " generally indicates that the preceding and following related objects are in an "or" relationship. "At least one of the following" or similar expressions refer to any combination of these items, including any combination of single or plural items. For example, at least one of a, b, or c can represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", where a, b, and c can be single or multiple.
[0181] It should be understood that in the description of the embodiments of this disclosure, "multiple" means two or more, "greater than", "less than", "exceeding" etc. are understood to exclude the number itself, and "above", "below", "within" etc. are understood to include the number itself.
[0182] In the several embodiments provided in this disclosure, it should be understood that the disclosed systems, apparatuses, and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces, indirect coupling or communication connection between apparatuses or units, and may be electrical, mechanical, or other forms.
[0183] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.
[0184] Furthermore, the functional units in the various embodiments of this disclosure can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.
[0185] It should also be understood that the various implementation methods provided in this disclosure can be combined arbitrarily to achieve different technical effects.
[0186] The above is a detailed description of the embodiments of this disclosure. However, this disclosure is not limited to the above embodiments. Those skilled in the art can make various equivalent modifications or substitutions without departing from the spirit of this disclosure. All such equivalent modifications or substitutions are included within the scope defined by the claims of this disclosure.
Claims
1. A method for determining cache parameters, characterized in that, include: Obtain a sequence of memory access requests generated by multiple target applications running on a simulation model, wherein each memory access request includes a request number and the memory address accessed; An address dataset is constructed based on the memory access request sequence. The address dataset includes multiple address points, and each address point includes a request sequence number and a corresponding memory address. Perform a first clustering process on multiple address points in the address dataset to obtain at least one first cluster. The total cache capacity is determined based on the memory address distribution of each address point in the first cluster. Get the preset number of cache groups and cache paths; Based on the number of cache groups and the number of cache paths, a second clustering process is performed on multiple address points in the address set to obtain multiple second clusters; The target cache line capacity is determined based on the memory address distribution of each address point in each of the second clusters.
2. The method for determining cache parameters according to claim 1, characterized in that, The step of performing a first clustering process on multiple address points in the address dataset to obtain at least one first cluster includes: Each address point in the address dataset is initialized into an independent cluster to obtain an initial cluster set; Calculate the distance between any two clusters in the current cluster set, and merge the two clusters with the smallest distance into a new cluster to obtain the updated cluster set; Repeat the above steps until the first preset termination condition is reached, and the cluster that reaches the first preset termination condition is determined as the first cluster.
3. The method for determining cache parameters according to claim 2, characterized in that, The step of determining the cluster set that meets the first preset termination condition as the first clustering cluster includes: Get the preset cache level; When the first preset termination condition is met, the clusters in the current cluster set are truncated and divided according to the number of cache levels to obtain multiple target clusters with the same number of cache levels; Each target cluster is determined as the first cluster of the corresponding level cache.
4. The determination method according to claim 2, characterized in that, The first preset termination condition includes any one of the following: All address points in the current cluster set have been merged into one cluster; Alternatively, the number of clusters in the current cluster set is equal to the preset first target number.
5. The method for determining cache parameters according to claim 2, characterized in that, The calculation of the distance between any two clusters in the current cluster set includes: Determine the centroid of each cluster in the current cluster set and the number of address points contained in each cluster; Based on the number of centroids and address points of the two clusters, calculate the intra-cluster variance increment of the new cluster formed after merging the two clusters; The intra-cluster variance increment is used as the distance value between the two clusters.
6. The method for determining cache parameters according to claim 1, characterized in that, The step of determining the total cache capacity based on the memory address distribution of each address point in the first cluster includes: For each of the first clusters, the corresponding memory addresses are obtained from the address dataset according to the request sequence number of each address point in the first cluster, forming an address subset of the first cluster; Determine the minimum and maximum memory addresses for each subset of addresses; The cache capacity of each first cluster is determined based on the minimum memory address and the maximum memory address of each of the address subsets; The total cache capacity is determined by summing the cache capacities of each first cluster.
7. The method for determining cache parameters according to claim 3, characterized in that, The step of determining the total cache capacity based on the memory address distribution of each address point in the first cluster includes: For each of the first clusters, the memory address range corresponding to the first cluster is determined based on the memory addresses of all address points contained in the first cluster; Calculate the capacity of the corresponding cache level based on the memory address range corresponding to each of the first clusters; The total cache capacity is determined by summing the calculated capacities of each cache level.
8. The method for determining cache parameters according to claim 1, characterized in that, The second clustering process is performed on multiple address points in the address set based on the number of cache groups and the number of cache paths to obtain multiple second clusters, including: The product of the number of cache groups and the number of cache paths is determined as the second target quantity; Randomly select the second target number of address points from the address dataset, and use each selected address point as an initial cluster, and determine the initial centroid of each initial cluster; Calculate the distance between each remaining address point in the address dataset and each initial centroid, and assign each remaining address point to the initial cluster containing the nearest initial centroid, thus obtaining the second target number of first-level clusters; Calculate the updated centroid of each first-level cluster, and calculate the distance between each remaining address point in the address dataset and each updated centroid. Reassign each remaining address point to the cluster containing the nearest updated centroid to obtain the second target number of second-level clusters. Treat the second-level cluster as the first-level cluster, and repeat the above steps until the second preset termination condition is met. The K clusters obtained when the second preset termination condition is met are determined as the second clustering cluster; K is an integer greater than or equal to 1.
9. The method for determining cache parameters according to claim 8, characterized in that, The second preset termination condition includes: The centroids of each cluster no longer change, or the clusters to which all address points in the address dataset belong no longer change, or the preset maximum number of iterations is reached.
10. The method for determining cache parameters according to claim 1, characterized in that, The step of determining the target cache line capacity based on the memory address distribution of each address point in each of the second clusters includes: For each of the second clusters, the memory address range corresponding to the second cluster is determined based on the memory addresses of all address points contained in the second cluster. The memory address range corresponding to each of the second clusters is determined as the cache line capacity represented by the second cluster; Calculate the arithmetic mean of the cache line sizes of all second clusters, and use this as the target cache line size.
11. A buffer parameter determination device, characterized in that, The device includes: The first acquisition module is used to acquire a sequence of memory access requests generated by multiple target applications running on the simulation model, wherein each memory access request includes a request number and the memory address accessed. The construction module is used to construct an address dataset based on the memory access request sequence. The address dataset includes multiple address points, and each address point includes a request sequence number and a corresponding memory address. The first clustering module is used to perform a first clustering process on multiple address points in the address dataset to obtain at least one first cluster. The first determining module is used to determine the total cache capacity based on the memory address distribution of each address point in the first cluster. The second acquisition module is used to acquire the preset number of cache groups and cache paths; The second clustering module is used to perform a second clustering process on multiple address points in the address set according to the number of cache groups and the number of cache paths, so as to obtain multiple second clusters; The second determining module is used to determine the target cache line capacity based on the memory address distribution of each address point in each of the second clusters.
12. An electronic device, characterized in that, The electronic device includes a memory, a processor, a program stored in the memory and executable on the processor, and a data bus for establishing communication between the processor and the memory. The program is executed by the processor to implement the cache parameter determination method as described in any one of claims 1 to 9.
13. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores one or more programs, which can be executed by one or more processors to implement the cache parameter determination method as described in any one of claims 1 to 9.