Memory address access frequency

By integrating the tracking circuit and the processing circuit into a single unit, the trade-off between storage capacity and information volume in memory access frequency analysis is resolved, enabling efficient and flexible memory access frequency analysis.

CN122270751APending Publication Date: 2026-06-23ARM LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
ARM LTD
Filing Date
2024-11-06
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

Existing technologies for memory access frequency analysis suffer from a trade-off between storage capacity and information volume, making it difficult to efficiently collect and analyze data, while also lacking flexibility and efficiency.

Method used

The frequency bins for memory access are generated using a tracking circuit. By using a binning and fusion mechanism of buffer circuits and processing circuits, the amount of data is reduced while maintaining access frequency information. The processing circuits are used to maintain a wider range of frequency bins to reduce storage requirements.

Benefits of technology

This enables efficient tracking and analysis of memory access frequency while reducing memory bandwidth requirements, thus improving the flexibility and efficiency of data processing.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122270751A_ABST
    Figure CN122270751A_ABST
Patent Text Reader

Abstract

An apparatus is provided in which a tracking circuit generates a trace indicative of a series of memory addresses of memory accesses to a memory. A buffer circuit performs binning on the memory addresses to produce a buffer circuit maintained frequency bin indicative of a frequency of access of the memory addresses. A processing circuit, separate from the buffer circuit, executes an instruction stream to update a processing circuit maintained frequency bin to indicate the frequency of access of the memory addresses based on the buffer circuit maintained frequency bin.
Need to check novelty before this filing date? Find Prior Art

Description

[0001] This technology relates to data processing, and more specifically to the field of memory access.

[0002] It might be desirable to understand which areas of memory are being accessed within a given time period. This could be useful, for example, in calculating where to place data. Frequently accessed data could be placed in memory areas with lower latency and / or higher bandwidth. Simultaneously, data accessed together could be 'load-balanced,' allowing simultaneous access to this data over short periods. One way to do this is to collect and store tracking data about data access. However, such data is large, requiring significant storage. This can be overcome by storing less data (e.g., only recent data). But this inevitably results in less information being available. Therefore, there is a trade-off between the amount of storage that must be provided and the amount of information available for making intelligent decisions about data placement, as well as the bandwidth used to capture the data. Another trade-off is between flexibility and efficiency, as there is a desire for flexibility in collecting and analyzing data while also performing it efficiently.

[0003] From the first example configuration, an apparatus is provided, comprising: a tracking circuit configured to generate a tracking of a series of memory addresses indicating memory accesses to memory; a buffer circuit configured to perform binning of the memory addresses to generate a buffer circuit maintenance frequency bin indicating the access frequency of the memory addresses; and a processing circuit, separate from the buffer circuit and configured to execute an instruction stream to update the processing circuit maintenance frequency bin based on the buffer circuit maintenance frequency bin to indicate the access frequency of the memory addresses.

[0004] From the second example configuration, a method is provided, which includes: generating a trace of a series of memory addresses indicating memory accesses to memory; performing binning on the memory addresses to generate a buffer circuit maintenance frequency bin, the buffer circuit maintenance frequency bin indicating the access frequency of the memory addresses; and executing an instruction stream to update the processing circuit maintenance frequency bin based on the buffer circuit maintenance frequency bin to indicate the access frequency of the memory addresses.

[0005] From the third example configuration, a tracing circuit is provided, which is configured to generate a tracing of a series of memory addresses indicating memory accesses to memory; a buffer circuit is configured to perform binning on the memory addresses to generate a buffer circuit maintenance frequency bin indicating the access frequency of the memory addresses; and a processing circuit, which is separate from the buffer circuit and is configured to execute an instruction stream to update the processing circuit maintenance frequency bin based on the buffer circuit maintenance frequency bin to indicate the access frequency of the memory addresses.

[0006] The present invention will be further described by way of example only, referring to the embodiments illustrated in the accompanying drawings, wherein:

[0007] Figure 1 Examples of data processing apparatuses are provided;

[0008] Figure 2 An example is shown of how the buffer circuit maintenance frequency box in the storage device can be updated in response to receiving a notification about memory access at the buffer circuit (e.g., as part of a tracking sample);

[0009] Figure 3 An example of the update process is shown;

[0010] Figure 4 Examples are shown in which a rewritable memory device contains different sets of instructions to be executed by processing circuitry;

[0011] Figure 5A and Figure 5B Flowcharts illustrating methods for processing data are provided, based on several examples; and

[0012] Figure 6 An example of implementing this technology on a chip is shown.

[0013] Before discussing the implementation scheme with reference to the accompanying drawings, the following description of the implementation scheme and its associated advantages is provided.

[0014] According to one example configuration, an apparatus is provided, comprising: a tracing circuit configured to generate a tracing of a series of memory addresses indicating memory accesses to memory; a buffer circuit configured to perform binning of the memory addresses to generate a buffer circuit maintenance frequency bin indicating the access frequency of the memory addresses; and a processing circuit, decoupled from the buffer circuit and configured to execute an instruction stream to update the processing circuit maintenance frequency bin based on the buffer circuit maintenance frequency bin to indicate the access frequency of the memory addresses.

[0015] In the example above, the trace contains indications of memory accesses to memory (e.g., a memory hierarchy) and specifically identifies the address being accessed. This trace is used to update a set of 'boxes' stored by buffer circuitry. Specifically, these boxes indicate the number of times an address (or address range) has been accessed within a defined time period. At certain points in time, an instruction stream is executed by the processing circuitry. These instructions are used to update another set of boxes maintained by the processing circuitry. Either or both sets of boxes can be implemented using an approximate counting algorithm (see, for example, https: / / en.wikipedia.org / wiki / Lossy_Count_Algorithm)This update is based on the contents of bins in the buffer circuitry. By performing binning in hardware, the amount of data that needs to be sent using memory bandwidth can be reduced. Specifically, much of the data regarding the exact order of memory accesses is ignored, as this is irrelevant to determining the frequency at which memory locations are hit. Furthermore, binning enables a degree of aggregation by tracking, for example, how many times a specific address (or address range) is accessed, rather than listing each individual access (which would result in duplicate data). The need for excessive hardware storage is addressed by allowing long-lived data to be stored in bins maintained by the processing circuitry, which can be stored, for example, in main memory or even in backup storage as needed. Thus, the combination of processing and buffer circuitry allows for the storage of large amounts of memory address access frequency data without consuming excessive bandwidth.

[0016] In some examples, the tracing circuitry is configured to generate a trace for a subset of all memory accesses made to the memory by at least one access element. The trace does not need to contain a complete list of memory accesses. Instead, statistical methods can be used, such as a trace containing details of every Nth memory access, or a trace randomly assigned to the memory access. In either case, since the trace will contain a statistical sample of all memory accesses, it can be assumed that the analysis of the sample will reflect the type and number of memory accesses that actually occurred. Specifically, in the case where only 1 / N memory accesses are part of the trace (through cyclical or random selection as discussed above), statistically, it might be expected that the number of frequency accesses is approximately 1 / N of its true value. Relatively speaking, the access frequency distribution should remain roughly the same.

[0017] In some examples, at least one of the buffer circuit-maintained frequency bins and the processing circuit-maintained frequency bins is larger than one. While some degree of merging can be achieved at the buffer circuit by making each bin a size of '1' (e.g., because multiple memory accesses to the same single address will be compressed into a single entry), a greater degree of compression is achieved when the bin is larger than one. That is, each bin refers to an address range, and the frequency associated with that bin increases whenever a memory access occurs at any memory address within that range.

[0018] In some examples, the size of at least one of the buffer circuitry-maintained frequency bin and the processing circuitry-maintained frequency bin is equal to the size of a memory page. In these examples, the range of each bin corresponds to a memory page. Therefore, any access to an address within that page results in the corresponding entry being incremented. The size of a memory page will depend on the memory architecture and is typically defined as a range of virtual addresses, for which a single entry is provided in the page table that provides the translation from virtual to physical addresses.

[0019] In some examples, in response to receiving a notification of a memory access to a memory address within a memory address, the buffer circuit is configured to, if a buffer circuit maintenance frequency box corresponding to a memory address exists, increment the frequency of that buffer circuit maintenance frequency box, and, based on a further condition, add a new entry to the buffer circuit maintenance frequency box that originally corresponded to a memory address. Thus, the binning process increases the frequency of the entry corresponding to the incoming memory access. If a memory access is directed to a non-existent memory location (i.e., not reflected in one of these boxes), a new entry is created, having a box into which the memory access can be inserted, provided another condition is met. This condition could be, for example, the existence of sufficient memory devices in the buffer circuit (possibly after discarding another entry according to an eviction criterion).

[0020] In some examples, the processing circuitry is configured to update the processing circuit maintenance frequency box to indicate the access frequency of memory addresses by merging the processing circuit maintenance frequency box and the buffer circuit maintenance frequency box based on the buffer circuit maintenance frequency box. Merging can take several different forms. In some examples, merging is achieved by adding the sums of the corresponding boxes together. It is assumed that non-existent boxes have a value of 0. In cases where the boxes are aligned at the same point and have the same length (e.g., each box is 4k long starting from address 0), this is a simple addition process. In other examples where the box sizes differ, it may be necessary to calculate how much overlap exists between the boxes and interpolate accordingly. For example, a buffer circuit box covering 6k to 10k could provide 50% of its total to the processing circuit box 4k to 8k, and another 50% of its total to the processing circuit box 8k to 10k. Alternatively, the totals of the 6k to 7k buffer circuit box and the 7k to 8k buffer circuit box could both be added to the 4k to 8k processing circuit box.

[0021] In some examples, the device includes: a buffer box storage circuit configured to store buffer circuit maintenance frequency boxes; and a processing circuit box storage circuit configured to store processing circuit maintenance frequency boxes, wherein the capacity of the processing circuit box storage circuit is greater than the capacity of the buffer box storage circuit. Because the storage capacity for the processing circuit maintenance frequency boxes (in terms of the number of boxes and / or the frequency that can be assigned to the boxes) is greater than the storage capacity for the buffer circuit maintenance frequency boxes, more information can be stored. Therefore, the total amount of tracking data to be stored can be kept high. Furthermore, because the processing circuit maintenance frequency boxes are programmable, various schemes can be used to further reduce the required space. The data can also be stored in a way that makes computation easier. One possible computation is to find the box with the highest or lowest access frequency.

[0022] In some examples, the buffer circuit maintains a frequency bin associated with the frequency bin of memory accesses occurring within a predetermined previous window. For example, the window could be a sliding window, where each entry is added to the buffer circuit for a specific duration. In these cases, where the sliding window coincides with the merging frequency (e.g., to prevent data accesses from merging twice), it may not be necessary to periodically reset the contents of the buffer circuit.

[0023] In some examples, the processing circuitry is configured to update its maintenance frequency bin in response to the execution instruction stream, thereby resetting the buffer circuitry's maintenance frequency bin. In these examples, when merging occurs, the buffer circuitry's maintenance frequency bin is completely reset (e.g., reset to 0). Therefore, the buffer circuitry 'starts again' and begins counting from zero again. This can be useful when considering the principle of spatial locality. Specifically, it might be desirable that memory accesses occurring adjacent to each other are likely to be spatially close. That is, they affect memory addresses that are close together. By resetting the buffer bin at a certain frequency, this means that tracing bins accessed long ago is generally less likely to continue. This frees up the buffer circuitry to track other accesses that occur more frequently.

[0024] In some examples, the device includes an instruction storage circuit configured to store an instruction stream, wherein the instruction storage circuit is rewritable. Therefore, in these examples, the instructions used to update the processing circuitry's maintenance frequency box can be changed as needed, for example, the update can be performed in a different manner.

[0025] In some examples, the instruction flow is mutable at runtime. In some examples, the instructions stored in the instruction store change as the processing circuitry is used. In some examples, the processing circuitry itself can execute instructions to modify the instructions stored in the instruction store to perform updates, for example, at runtime.

[0026] In some examples, the processing circuit maintenance frequency boxes are divided into multiple sets of processing circuit maintenance frequency boxes; and for each processing circuit maintenance frequency box in one set of the sets, an instruction stream is selected from multiple instruction streams and applied to those processing circuit maintenance frequency boxes in that set. In these examples, different forms of updates can be performed for different boxes. Specifically, updates can be performed in different ways for different applications. For example, for an application performing highly localized spatial data, data accessed a few minutes ago may not be relevant, and therefore the update being performed may only look at the most recent time window reported by the buffer circuitry. For another application (which may be performing concurrently), memory accesses may be more dispersed, so considering memory accesses, especially the memory addresses most frequently accessed over long periods of time, may be useful. Therefore, this may result in an update that adds the value held by the buffer circuitry to the value held by the processing circuitry.

[0027] The specific implementation scheme will now be described with reference to the accompanying drawings.

[0028] Figure 1 A data processing apparatus 100 according to some examples is illustrated. An access element 110 (which may be one of several access elements) issues a memory access request to memory. Notifications and details regarding these accesses are transmitted to a tracking circuit 120, which generates a trace based on a sample of the notifications. There are various ways to generate a trace. For example, every Nth memory access from a given requester may be included in the trace. Alternatively, each time the tracking circuit 120 receives information about a memory access, there may be a random chance that data will be included in the trace generated by the tracking circuit. In some examples, all memory accesses are filtered according to specific characteristics, such that sampling is performed based on these characteristics. Such characteristics may include memory accesses within an address range, on-demand and prefetched memory accesses, memory accesses that miss cache levels, or memory accesses that cause processor execution to stall for more than a threshold number of cycles, or any combination of these characteristics. Of course, in some embodiments, the generated trace will not be a sample, but will contain a record of every memory access that occurs.

[0029] The generated traces are provided to buffer circuit 130 and used to fill multiple buckets maintained by buffer circuit 130 in storage device 160 (which may form part of buffer circuit 130 itself). In this example, the size of the buckets in buffer circuit 130 is comparable to the page size of the memory being accessed, which is 4k in this example. As shown in this example, the buffer circuit has experienced 11 accesses to addresses 0 or higher and less than 4k, and 42 accesses to addresses 4k or higher and less than 8k. Because buffer circuit 130 provides dedicated hardware storage, it can operate quickly. The buffer circuit can even form part of access element 110, thereby limiting its impact on any bus bandwidth.

[0030] The processing circuit 140 also maintains separate sets of bins in different storage devices 170 (such as DRAM). In this example, the bins maintained by the processing circuit 140 are more extensive than those maintained by the buffer circuit 130. That is, a greater number of bins can be stored, and a higher frequency can be represented for each bin. In fact, the storage device 170 used to store the bins of the processing circuit 140 may be able to store the frequency data of every page in the memory system.

[0031] Periodically, the processing circuit 140 performs updates to update the bins in its storage device 170. The updates to be performed are handled by instructions 150 stored in the rewritable storage device, and therefore the instructions themselves can be changed, for example, during runtime. The updates can take various forms, but are based on bins maintained by the buffer circuit 130.

[0032] As a result of the above, by utilizing the large storage device 170 available to the processing circuitry, all sampled memory accesses can be tracked, not just a subset thereof. On the other hand, since the first-level fusion occurs in the storage device 160 used by the buffer circuitry 130, the amount of bandwidth used to store all accessed data is reduced compared to the case where the same samples are directly stored by the processing circuitry 140.

[0033] It should be noted that in this example, multiple components have been separated for readability. However, in practice, several components can be identical. For example, processing circuitry 140 and access elements (or at least one of these access elements) 110 can be identical, or they can be components of the same element. Furthermore, the memory being accessed and DRAM storage device 170 can be identical, and either of them can be identical to the rewritable storage device 150 used to store instructions. Other possibilities also exist.

[0034] Figure 2An example is shown of how the buffer circuit maintenance frequency bin in storage device 160 can be updated in response to a notification of a memory access received at buffer circuit 130. In this example, the bin size is 4k, which is also the page size. A first memory read request for address 3845 is received. This causes the frequency count of bin '>=0 and <4k' in storage circuit 160 to increment by 1. A memory write request is then received. This writes the value stored in register r5 to memory address 16718. In this case, there is no bin covering address 16718. If storage is available in storage circuit 160, a new bin ('>=16k and <20k') is created and set to an initial value (e.g., 1). Then, another memory write request occurs, which writes the value in register r5 to address 20718. In this case, there is no remaining storage to store another new bin. Therefore, the memory access notification / tracking entry is discarded. Other conditions may affect whether a particular memory access is maintained when the existing bin does not cover the address of the memory access.

[0035] The result of the modification is a set of bins covering addresses in the ranges >= 0 and < 4k, >= 4k and < 8k, or >= 16k and < 20k. Compared to the initial state of the bins (e.g., the initially existing frequency counts), the counts for '>= 0 and < 4k' have been incremented by one, and the counts for '>= 4k and < 8k' remain unchanged because no addresses in that range have been accessed. Of course, other update techniques can also be used. In some examples, for instance, a lack of memory for adding new bins might cause the processing circuitry to maintain frequency bin updates (e.g., ...). Figure 3 (As shown).

[0036] Figure 3 An example of the update process is shown. Here, the processing circuit maintenance frequency box in the storage device 170 is updated with reference to the buffer circuit maintenance frequency box 160. Here, the update involves accumulation, which is achieved by adding the boxes together. For example, the box '>=0 and <4k' in the processing circuit maintenance frequency box is initially 8773. The same box in the buffer circuit maintenance frequency box is 11. Adding these two values ​​together, the box obtains a value of 8784 in the updated processing circuit maintenance frequency box. Similarly, for the boxes '>=4k and <8k', the frequency is updated by increasing 42 from 3 to give an updated value of 45. It is assumed that a non-existent box has a value of zero. Therefore, the box '>=8k and <12k' with a value of 99 in the processing circuit maintenance frequency box remains at that value because there is no corresponding box in the buffer circuit maintenance frequency box. Therefore, the value remains at 99.

[0037] Other forms of accumulation are possible. Specifically, Figure 4An example is illustrated in which the rewritable storage device 150 contains different sets of instructions to be executed by the processing circuitry 140. Each set of instructions produces a different update mechanism, and each set of instructions in the rewritable storage device can be (e.g., at runtime) replaced. In this way, the update process can be customized for different situations without modifying the hardware. Here, each set of instructions is applied to a different set of boxes, i.e., each set of boxes corresponds to a different application.

[0038] For example, the first instruction set (instruction set 1) provides an addition mechanism as previously illustrated. Here, there are two bins constrained by the policy ('>=0 and <4k' and '>=4k and <8k'). These bins have values ​​of 8773 and 3, respectively. The corresponding bins in the update time buffer circuit are 11 and 42. Therefore, the fusion performed by the addition process makes the bins maintained by the processing circuit become 8784 (8773+11) and 45 (42+3), respectively.

[0039] The second instruction set (Instruction Set 2) provides a discarding mechanism. That is, each time an update occurs, the new set of bins is merged with the old set of bins, and the bin with the lowest frequency and subject to these instructions is discarded. This strategy applies only to two bins ('>=8k and <12k' and '>=12k and <16k'). The first bin starts with a value of 99. At update time, the corresponding bin maintained by the buffer circuit has a value of 3. Therefore, the merged value is 102. The second bin has a value of 6, but does not have a value maintained by the buffer circuit. Therefore, the merged value is 6. Therefore, the smallest bin is the '>=12k and <16k' bin, and this bin is discarded.

[0040] The third instruction set (Instruction Set 3) provides a sorting mechanism. That is, each time an update occurs, bins are merged according to their range and then sorted in descending order of frequency. The only new bins ('>=24k and <28k') to which this strategy applies start with a value of 99. During an update, the corresponding bins maintained by the buffer circuit have a value of 7. Therefore, the value of the bin in the frequency bins maintained by the processing circuit is replaced with the value '106'. The bins maintained by the buffer circuit include '>=28k and <32k' at frequency 3 and '>=32k and <36k' at frequency 55. Therefore, these bins are sorted in the order '>=24k and <28k', '>=32k and <36k', and '>=28k and <32k'.

[0041] Another example of an instruction set is the balancing instruction, used to adapt bin size to balance accuracy and storage size. For example, bins can be joined by adding the range and the counts within that range. Simultaneously, bins can be split by assuming the counts are evenly distributed in the splits. The next iteration can then use the newly joined / split bins.

[0042] Figure 5A A first flowchart 500 is illustrated, illustrating a first method for handling the update process. At step 505, a tracking entry is considered, for example, at buffer circuit 130. At step 510, a tracking entry is added to buffer circuit 130 as appropriate. That is, if a relevant bin already exists, the bin is incremented. If not, a new bin is created, which encapsulates the tracking entry. It is then determined whether the buffer is full. If not, the process returns to step 505. Otherwise, at step 520, an update is performed on the frequency bins maintained by the processing circuit, and thus a merging occurs between these bins and the bins stored by the buffer circuit. Then, at step 525, buffer circuit 130 is reset (e.g., contents are deleted). Note that multiple tracking entries can be received simultaneously from tracking circuit 120. In this case, the same flowchart is followed, where one tracking entry is considered at a time.

[0043] In this example, the buffer circuit is erased each time it is filled during the fusion process.

[0044] Figure 5B A second flowchart 530 is illustrated, which shows an alternative implementation. In this implementation, the buffer acts as a first-in, first-out (FIFO) buffer. Here, the tracking entry is considered again at step 535, and in step 540, the entry is added to the buffer circuit 130 as appropriate (see [link to flowchart illustration]). Figure 2 In other words, the existing bin count is incremented or a new bin is created. At step 545, it can be determined whether any new entry to the buffer (i.e., a new bin) will cause a buffer overflow. If not, the process simply returns to step 535 to dispose of the next tracked entry. If yes, at step 550, the oldest bin is popped from the buffer to allow the addition of a new entry (new bin). Then, at step 555, the oldest bin is merged with the bins maintained by the processing circuitry. The process then returns to step 535.

[0045] In this second example, the update process is faster because only a single bin needs to be merged at a time. Furthermore, since the bins remain in the buffer circuit for a longer period before being removed, more buffering is performed before merging these bins. That is, in Figure 5A In the example, a widely distributed series of accesses could lead to the creation of many bins, which are quickly removed and then merged. In contrast, in Figure 5B In the example, the boxes are kept in place until they must be removed to make room for another box. Therefore, the counter value can reach a higher value.

[0046] Of course, other variations are also possible. For example, a set of the oldest entries could be popped and merged at steps 550 and 555, instead of... Figure 5BThe process pops out and merges individual entries. In other examples, this process might be combined with... Figure 5A Similarly, but instead of merging and resetting the buffer when it's full, this can happen after a certain period of time. This helps address the problem of frequent accesses to the same address causing the buffer to take a long time to fill, and thus merging with processing circuitry-based bins. Of course, these techniques can also be combined.

[0047] Another way to achieve this is to use a list of memory accesses instead of bins. Each memory access can be stored along with its associated time, and any access that falls outside the sliding window can be removed. When merging occurs at step 555, a count is performed on the entries in each bin that will fall into processing circuitry 140, and the bins maintained by processing circuitry 140 are updated. In practice, in this example, bins can be duplicated, and each bin stores exactly one entry.

[0048] The concepts described herein may be embodied in computer-readable code used to manufacture devices embodying the described concepts. For example, the computer-readable code may be used in one or more stages of the semiconductor design and manufacturing process, including the electronic design automation (EDA) stage, to manufacture integrated circuits including devices embodying these concepts. The aforementioned computer-readable code may additionally or alternatively enable the definition, modeling, simulation, verification, and / or testing of devices embodying the concepts described herein.

[0049] For example, computer-readable code for manufacturing a device embodying the concepts described herein may be embodied in code that defines the hardware description language (HDL) representation of these concepts. For instance, the code may define a register-transfer level (RTL) abstraction of one or more logic circuits for defining a device embodying these concepts. The code may define an HDL representation of one or more logic circuits embodying the device using Verilog, SystemVerilog, Chisel, or VHDL (Very High Speed ​​Integrated Circuit Hardware Description Language) and intermediate representations such as FIRRTL. Computer-readable code may provide definitions of the concepts or other behavioral representations of the concepts embodying the concepts using system-level modeling languages ​​such as SystemC and SystemVerilog, which can be interpreted by a computer to enable simulation, functional and / or formal verification and testing of the concepts.

[0050] Additionally or alternatively, computer-readable code may define a low-level description of an integrated circuit component embodying the concepts described herein, such as one or more netlists or integrated circuit layout definitions, including representations such as GDSII. One or more netlists or other computer-readable representations of the integrated circuit component may be generated by applying one or more logic synthesis processes to the RTL representation to generate a definition for manufacturing a device embodying the invention. Alternatively or additionally, one or more logic synthesis processes may generate a bitstream from the computer-readable code to be loaded into a field-programmable gate array (FPGA) to configure the FPGA to embody the described concepts. The FPGA may be deployed for the purpose of verifying and testing the concepts prior to manufacturing integrated circuits, or the FPGA may be deployed directly in a product.

[0051] Computer-readable code may include a mixture of code representations for manufacturing apparatus, such as one or more of RTL representations, netlist representations, or other computer-readable definitions used in the semiconductor design and manufacturing process for manufacturing apparatus embodying the present invention. Alternatively or additionally, the concept may be defined in a combination of computer-readable definitions used in the semiconductor design and manufacturing process for manufacturing apparatus and computer-readable code defining instructions that will be executed by the defined apparatus once manufactured.

[0052] Such computer-readable code can be contained in any known transient computer-readable medium (such as wired or wireless transmission of code over a network) or non-transient computer-readable medium such as semiconductors, disks, or optical discs. Integrated circuits made using computer-readable code may include components such as one or more of the following: a central processing unit, a graphics processing unit, a neural processing unit, a digital signal processor, or other components that embody the concept independently or collectively.

[0053] The concepts described herein can be specifically embodied in systems including at least one packaged chip. The previously described devices are implemented in this at least one packaged chip (either in a specific chip of the system or distributed across more than one packaged chip). At least one packaged chip is assembled on a board with at least one system component. Chip-containing products may include systems assembled on additional boards with at least one other product component. The system or chip-containing product may be assembled into a housing or onto a structural support (such as a frame or blade).

[0054] like Figure 6As shown, one or more packaged chips 400 are manufactured by a semiconductor chip manufacturer, wherein the devices described above are implemented on a single chip or distributed across two or more chips. In some examples, the chip product 400 manufactured by the semiconductor chip manufacturer may be provided as a semiconductor package, which includes a protective housing (e.g., made of metal, plastic, glass, or ceramic) housing the semiconductor device implementing the aforementioned devices, and connectors such as pads, solder balls, or pins for connecting the semiconductor device to an external environment. Where more than one chip 400 is provided, these chips may be provided as individual integrated circuits (provided as separate packages), or may be packaged by a semiconductor provider into a multi-chip semiconductor package (e.g., using interpolators, or by using three-dimensional integration to provide a multi-layer chip product comprising two or more vertically stacked integrated circuit layers).

[0055] In some examples, a collection of chiplets (i.e., small modular chips with specific functionalities) may be referred to as a chip in itself. Chipslets may be individually packaged in semiconductor packages and / or packaged together with other chiplets in multi-chiplet semiconductor packages (e.g., using interpolators, or by using three-dimensional integration to provide multi-layer chiplet products comprising two or more vertically stacked integrated circuit layers).

[0056] One or more packaged chips 400 are assembled on a board 402 together with at least one system component 404 to provide system 406. For example, the board may include a printed circuit board. The board substrate may be made of any of a variety of materials, such as plastic, glass, ceramic, or flexible substrate materials such as paper, plastic, or textile materials. At least one system component 404 includes one or more external components that are not part of the one or more packaged chips 400. For example, at least one system component 404 may include any one or more of the following: another packaged chip (e.g., supplied by a different manufacturer or manufactured at a different process node), an interface module, a resistor, a capacitor, an inductor, a transformer, a diode, a transistor, and / or a sensor.

[0057] A chip-containing product 416 is manufactured, comprising a system 406 (including a board 402, one or more chips 400, and at least one system component 404) and one or more product components 412. Product components 412 include one or more additional components that are not part of system 406. As an example, in a non-exhaustive list, one or more product components 412 may include user input / output devices such as keyboards, touchscreens, microphones, speakers, displays, haptic devices, etc.; wireless communication transmitters / receivers; sensors; actuators for actuating mechanical motion; thermal control devices; additional packaged chips; interface modules; resistors; capacitors; inductors; transformers; diodes; and / or transistors. System 406 and one or more product components 412 may be assembled on an additional board 414.

[0058] Plate 402 or another plate 414 may be disposed on or within the equipment housing or other structural support (e.g., frame or blade) to provide a product that can be disposed of by a user and / or intended for operational use by personnel or company.

[0059] System 406 or chip-containing product 416 can be at least one of the following: end-user product, machine, medical device, computing or telecommunications infrastructure product, or automated control system. For example, as a non-exhaustive list, a chip-containing product can be any of the following: telecommunications equipment, mobile phone, tablet computer, laptop computer, computer, server (e.g., rack server or blade server), infrastructure equipment, networking equipment, vehicle or other automotive product, industrial machine, consumer device, smart card, credit card, smart glasses, avionics equipment, robotic equipment, camera, television, smart TV, DVD player, set-top box, wearable device, home appliance, smart meter, medical device, heating / lighting control equipment, sensor, and / or control system for controlling public infrastructure equipment (such as smart highways or traffic lights).

[0060] In this application, the phrase "configured as..." is used to mean that the elements of the device have a configuration capable of performing the defined operation. In this context, "configuration" means the arrangement or manner of interconnection of hardware or software. For example, the device may have dedicated hardware that provides the defined operation, or a processor or other processing device may be programmed to perform the function. "Configured as" does not mean that the elements of the device need to be changed in any way to provide the defined operation.

[0061] While exemplary embodiments of the invention have been described in detail herein with reference to the accompanying drawings, it should be understood that the invention is not limited to those precise embodiments, and various changes, additions, and modifications can be made therein by those skilled in the art without departing from the scope and spirit of the invention as defined by the appended claims. For example, features of the dependent claims can be combined with features of the independent claims in various ways without departing from the scope of the invention.

Claims

1. An apparatus, the apparatus comprising: A tracing circuit configured to generate a tracing of a series of memory addresses indicating memory accesses to the memory; A buffer circuit configured to perform binning on the memory address to generate a buffer circuit maintenance frequency bin, the buffer circuit maintenance frequency bin indicating the access frequency of the memory address; and A processing circuit, separate from the buffer circuit, is configured to execute an instruction stream to update the processing circuit's maintenance frequency box to indicate the access frequency of the memory address based on the buffer circuit's maintenance frequency box.

2. The apparatus according to claim 1, wherein The tracing circuit is configured to generate the trace for a subset of all memory accesses to the memory performed by at least one access element.

3. The apparatus according to any of the preceding claims, wherein The size of at least one of the buffer circuit maintenance frequency box and the processing circuit maintenance frequency box is greater than one.

4. The apparatus according to any of the preceding claims, wherein The size of at least one of the buffer circuit maintenance frequency box and the processing circuit maintenance frequency box is equal to the page size of the memory.

5. The apparatus according to any of the preceding claims, wherein In response to receiving a notification regarding a memory access to one of the memory addresses: The buffer circuit is configured to, when a buffer circuit maintenance frequency box corresponding to one of the memory addresses in the memory address exists, increment the frequency of the buffer circuit maintenance frequency box corresponding to the memory address in the memory address. Based on further conditions, a new entry is added to the buffer circuit maintenance frequency box, the new entry originally corresponding to one of the memory addresses in the memory address.

6. The apparatus according to any of the preceding claims, wherein The processing circuit is configured to update the processing circuit maintenance frequency box by merging the processing circuit maintenance frequency box and the buffer circuit maintenance frequency box to indicate the access frequency of the memory address.

7. The apparatus according to any of the preceding claims, the apparatus comprising: Buffer box storage circuit, the buffer box storage circuit being configured to store the buffer circuit maintenance frequency box; and A processing circuit box storage circuit is configured to store the processing circuit maintenance frequency box, wherein... The capacity of the processing circuit box storage circuit is greater than the capacity of the buffer box storage circuit.

8. The apparatus according to any of the preceding claims, wherein The buffer circuit maintains the frequency box associated with the frequency box of the memory access that occurs within a predetermined previous window.

9. The apparatus according to any of the preceding claims, wherein The processing circuit is configured to update the processing circuit maintenance frequency box in response to executing the instruction stream, thereby resetting the buffer circuit maintenance frequency box.

10. The apparatus according to any of the preceding claims, wherein The processing circuit is configured to execute the instruction stream to update the processing circuit maintenance frequency box at predetermined time intervals.

11. The apparatus according to any preceding claim, wherein the apparatus comprises: Instruction storage circuit, the instruction storage circuit being configured to store the instruction stream, wherein The instruction storage circuit is rewritable.

12. The apparatus according to any of the preceding claims, wherein The instruction stream is changeable at runtime.

13. The apparatus according to any of the preceding claims, wherein The processing circuit maintenance frequency box is divided into a set of processing circuit maintenance frequency boxes; and For each processing circuit maintenance frequency box in one set of the set of processing circuit maintenance frequency boxes, the instruction stream is selected from a plurality of instruction streams and the instruction stream is applied to those processing circuit maintenance frequency boxes in that set.

14. A method, the method comprising: Generates a trace of a series of memory addresses that indicate memory accesses performed on the memory. The memory address is binned to generate a buffer circuit maintenance frequency bin, which indicates the access frequency of the memory address; as well as The execution instruction stream updates the processing circuit maintenance frequency box based on the buffer circuit maintenance frequency box to indicate the access frequency of the memory address.

15. A non-transitory computer-readable medium for storing computer-readable code for manufacturing an apparatus, the apparatus comprising: A tracing circuit configured to generate a tracing of a series of memory addresses indicating memory accesses to the memory; A buffer circuit configured to perform binning on the memory address to generate a buffer circuit maintenance frequency bin, the buffer circuit maintenance frequency bin indicating the access frequency of the memory address; and A processing circuit, separate from the buffer circuit, is configured to execute an instruction stream to update the processing circuit's maintenance frequency box to indicate the access frequency of the memory address based on the buffer circuit's maintenance frequency box.

16. A system comprising: The apparatus according to any of the preceding claims is implemented in at least one packaged chip; At least one system component; and board, among which The at least one packaged chip and the at least one system component are assembled on the board.

17. A chip-containing product, the chip-containing product comprising the system of claim 16, the system being assembled on an additional board together with at least one other product component.