Processor, information processing device, and method for controlling a processor
The processor dynamically adjusts prefetch distance based on stream frequency to enhance processing performance by reducing cache misses and maintaining efficient cache utilization.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Applications
- Current Assignee / Owner
- FUJITSU LTD
- Filing Date
- 2024-12-02
- Publication Date
- 2026-06-12
Smart Images

Figure 2026095945000001_ABST
Abstract
Description
【Technical Field】 【0001】 The present invention relates to a processor, an information processing apparatus, and a control method for a processor. 【Background Art】 【0002】 A processor such as a CPU (Central Processing Unit) is equipped with a cache that stores a part of the data stored in the main memory device, aiming to conceal access latency and improve throughput deficiencies. As a method for improving the cache hit rate and concealing access latency, a prefetch method that reads data predicted to be used in the near future into the cache in advance is known. One of the prefetch methods is hardware prefetch (see, for example, Patent Documents 1 and 2). 【Prior Art Documents】 【Patent Documents】 【0003】 【Patent Document 1】 Japanese Patent Application Laid-Open No. 2005-242527 【Patent Document 2】 Japanese Patent Application Laid-Open No. 2017-045153 【Summary of the Invention】 【Problems to be Solved by the Invention】 【0004】 For example, when a processor having a hardware prefetch function detects a stream access that is a plurality of memory accesses with consecutive addresses, it sequentially issues prefetch requests in the direction where the addresses are consecutive. Hereinafter, the memory access processing by stream access is also referred to as a stream, and the difference between the address included in the memory access request and the prefetch destination address is also referred to as a stride. Also, the stride is set to an integer multiple of the minimum stride, and the magnification with respect to the minimum stride is referred to as a prefetch distance. 【0005】 To suppress a decrease in cache utilization efficiency, it is preferable that prefetched data be stored in the cache immediately before it is read from the cache in a memory access request. However, if the prefetch distance is too short, the target data may be stored in the cache after a memory access request to read the target data has been issued, which may result in a cache miss and a decrease in processing performance. Conversely, if the prefetch distance is too long, the storage of the target data into cache memory will be relatively fast, which may cause other necessary data to be evicted from cache memory, which may result in a cache miss and a decrease in processing performance. 【0006】 Furthermore, when a processor executes multiple programs in parallel, and each program generates a stream, the appropriate prefetch distance changes depending on the number of streams. For example, when there are many streams, the frequency of memory access requests for each stream decreases. 【0007】 If memory access requests are infrequent and the prefetch distance is long, the timing of storing the target data in cache memory via prefetching may occur before the timing of the memory access request for that data. This can lead to other necessary data held in cache memory being evicted before it is used, potentially causing performance degradation. Furthermore, if the target data has been evicted from cache memory when a memory access request for prefetched data occurs, the benefits of prefetching will be lost. Therefore, it is preferable to shorten the prefetch distance. 【0008】 In contrast, when the number of streams is small, the frequency of memory access requests for each stream increases. If the frequency of memory access requests is high and the prefetch distance is short, the prefetch effect will not be obtained if the memory access request for the target data occurs before the timing of storing the target data in the cache memory through prefetching. Therefore, it is preferable to increase the prefetch distance. 【0009】 If the prefetch distance cannot be changed regardless of the number of streams, the prefetch distance may be too short or too long depending on the characteristics of the program the processor is running, resulting in insufficient improvement in processor performance due to prefetching. However, no method has been proposed to change the prefetch distance according to the number of streams. 【0010】 In one aspect, the present invention aims to improve the processing performance of a processor by dynamically changing the prefetch distance according to the number of streams. [Means for solving the problem] 【0011】 From one perspective, the processor includes a cache that holds data read from memory in response to a memory access request, a prefetch queue having a plurality of entries assigned to each stream which is a plurality of memory access requests with consecutive addresses, each of which is used to control the prefetching of data from memory to the cache for each stream, a stride setting unit that adjusts the stride, which is the amount of change between the access address included in the memory access request and the prefetch destination address, according to the number of valid entries assigned to each stream, and reduces the stride as the number of valid entries increases, and a prefetch management unit that, for each stream, after the number of memory access requests with consecutive addresses reaches a preset first threshold, issues a prefetch request to the memory using the stride adjusted by the stride setting unit for each memory access request with consecutive addresses. [Effects of the Invention] 【0012】 By dynamically changing the prefetch distance according to the number of streams, the processing performance of the processor can be improved. [Brief explanation of the drawing] 【0013】 [Figure 1] Block diagram showing an example of a processor in one embodiment. [Figure 2] Figure 1 shows an example of the structure of a prefetch queue. [Figure 3] Figure 1 is a block diagram showing an example of the configuration of the stride setting unit. [Figure 4] Figure 1 is an explanatory diagram showing an example of prefetch operation by the prefetch control unit. [Figure 5] This figure shows an example of how the state of entries in the prefetch queue changes when the operation in Figure 4 is performed. [Figure 6] Figure 3 is an explanatory diagram showing an example of how the correction value generation unit generates correction values. [Figure 7] Figure 3 is an explanatory diagram showing an example of how the prefetch distance is generated by the prefetch distance generation unit. [Figure 8] Figure 1 is a flowchart showing an example of the operation of the prefetch queue management unit. [Figure 9] Figure 8 is a flowchart showing an example of the operation in step S200. [Figure 10] This is a flowchart showing an example of the operation in step S210 of Figure 9. [Modes for carrying out the invention] 【0014】 Embodiments will be described below with reference to the drawings. In the following, the same symbols as the signal names will be used for signal lines through which signals are transmitted. 【0015】 FIG. 1 shows an example of a processor in one embodiment. The processor 100 shown in FIG. 1 includes an instruction issuing unit 10, an L1 (Level 1) cache control unit 20, a prefetch control unit 30, and an L1 cache 80. The prefetch control unit 30 includes a prefetch queue management unit 40, a prefetch queue 50, a stride setting unit 60, and a prefetch request issuing unit 70. For example, the processor 100 is mounted on an information processing apparatus 300 together with a memory 200 such as a main storage device. Note that the memory 200 is not limited to the main storage device, and may be an L2 (Level 2) cache disposed between the L1 cache 80 and the main storage device. 【0016】 Hereinafter, an example in which the prefetch control unit 30 controls prefetching of data from the memory 200 to the L1 cache 80 (data cache) based on an address included in a memory access request REQ such as a load instruction will be described. However, the prefetch control unit 30 may control prefetching of instructions from the memory 200 to the L1 cache 80 (instruction cache) based on an instruction fetch address generated based on the program counter. In this case, the instructions held in the memory 200 and the L1 cache 80 are treated as data. 【0017】 When the instruction fetched from the memory 200 is a memory access request REQ, the instruction issuing unit 10 generates a request address R-ADRS of the memory access request REQ by an operand address generator (not shown), and outputs the generated request address R-ADRS. The request address R-ADRS is output to the L1 cache control unit 20, the prefetch queue management unit 40, and the prefetch request issuing unit 70. The request address R-ADRS is an example of an access address. Note that when the instruction fetched from the memory 200 is an arithmetic instruction, the instruction issuing unit 10 may issue the arithmetic instruction to an arithmetic unit (not shown). 【0018】 The L1 cache control unit 20 determines whether the operand data handled by the memory access request REQ output from the instruction issuing unit 10 is stored in the L1 cache 80. If the operand data is stored in the L1 cache 80, the L1 cache control unit 20 outputs a cache hit signal L1-HIT. If the operand data is not stored in the L1 cache 80, the L1 cache control unit 20 outputs a cache miss signal L1-MIS and issues a data request DREQ (i.e., a memory access request) to the memory 200. 【0019】 In the prefetch control unit 30, the prefetch queue 50 has an entry ENT used to manage the prefetching of data from memory 200 to the L1 cache 80 for each stream access, which is a memory access of multiple consecutive blocks. The prefetch queue 50 also has a stride holding unit that holds a stride STRD that is used in common for multiple entries ENT. 【0020】 Hereinafter, memory access processing using stream access will be referred to as a stream. By providing multiple entries ENT and a stride holding unit in the prefetch queue 50, the prefetch control unit 30 can control the prefetching of data from memory 200 for each of the multiple streams. An example of the prefetch queue 50 is shown in Figure 2. 【0021】 The prefetch queue management unit 40 updates the information held in the corresponding entry ENT based on the cache miss signal L1-MIS. If the information held in the corresponding entry ENT satisfies the conditions for issuing a prefetch request PFREQ, the prefetch queue management unit 40 outputs a prefetch request activation instruction PFST to the prefetch request issuing unit 70. An example of the operation of the prefetch queue management unit 40 is shown in Figure 8. 【0022】 The stride setting unit 60 dynamically adjusts the prefetch distance and stride STRD based on the information held in the entry ENT corresponding to the stream. The prefetch distance is expressed as an integer (i.e., a multiplier) indicating how many units of the minimum stride the stride STRD, which is the address difference from the request address R-ADRS included in the memory access request REQ to the prefetch destination address, is. In other words, the prefetch distance indicates the number of units of stride STRD after it has been set by the stride setting unit 60, with the minimum stride STRD being defined as 1 unit. 【0023】 For example, if the stride STRD (address difference) is 300 and the minimum stride is 100, the prefetch distance will be 3. After determining the prefetch distance, the stride setting unit 60 converts the determined prefetch distance into stride STRD and stores it in the stride holding unit of the prefetch queue 50. The storage of stride STRD into the stride holding unit may be performed by the prefetch queue management unit 40. An example of the configuration of the stride setting unit 60 is shown in Figure 3, and examples of the operation of the stride setting unit 60 are shown in Figures 9 and 10. In the following description, the various prefetch distances indicated by the symbol DIST may be simply referred to as distance. 【0024】 The prefetch request issuing unit 70 issues a prefetch request PFREQ to the memory 200 based on the prefetch activation instruction PFST from the prefetch queue management unit 40. The prefetch queue management unit 40 and the prefetch request issuing unit 70 are examples of prefetch management units. 【0025】 The L1 cache 80 has multiple cache lines CL that hold a portion of the data held in memory 200. When the L1 cache control unit 20 determines that a cache hit L1-HIT has occurred for a memory access request REQ, the L1 cache 80 transfers the data to be read from the hit cache line CL to a general-purpose register or the like (not shown). 【0026】 When the L1 cache control unit 20 determines that a cache miss (L1-MIS) has occurred, the L1 cache 80 stores the data for one cache line, including the data to be read from memory 200, into one of the cache lines CL. In Figure 1, normal data read from memory 200 without prefetching is shown with the code DT, and data prefetched from memory 200 is shown with the code PDT. 【0027】 Figure 2 shows an example of the structure of the prefetch queue 50 in Figure 1. Each entry ENT in the prefetch queue 50 has an area that holds a valid flag VLD, a predicted address P-ADRS, and a counter value R-CNT, and is assignable per stream. Figure 2 shows an example where one entry ENT is assigned for stream A and another entry ENT is assigned for stream B. The area that holds the counter value R-CNT is an example of a match count storage section. 【0028】 The valid flag VLD is set to, for example, "1" when an entry ENT is enabled for use in a stream, and reset to, for example, "0" when an entry ENT is disabled. Below, disabling an entry ENT is also referred to as deleting or deallocating the entry ENT. An entry ENT in a reset state is treated as an empty entry. 【0029】 The valid flag VLD is reset when no prefetch queue hits (PFQhit), which indicate a continuous occurrence of memory access requests (REQ) belonging to a stream using entry ENT, occur for a certain period of time. Additionally, when an entry ENT is used in a new stream while all entries ENT are valid, the valid flag VLD of the entry ENT with a small counter value R-CNT is reset to create a free entry. 【0030】 If a cache miss occurs in L1 cache 80, one of the available entries is newly registered as entry ENT for the stream corresponding to the memory access request REQ that missed the cache. The valid flag VLD of the newly registered entry ENT is set to "1". 【0031】 The area of predicted address P-ADRS stores the request address R-ADRS included in the memory access request REQ that is expected to be issued next from the instruction issuer 10 in the same stream, as a predicted address value. The area of predicted address P-ADRS is an example of a predicted value storage area. When the prefetch queue management unit 40 determines that the request address R-ADRS of the memory access request REQ is included in the stream managed by entry ENT, it stores the request address R-ADRS that is expected to be issued next as the predicted address P-ADRS. 【0032】 The prefetch queue management unit 40 determines that the prefetch queue 50 has hit if the requested address R-ADRS included in the memory access request REQ matches the predicted address P-ADRS. Hereinafter, a hit in the prefetch queue 50 will be referred to as a prefetch queue hit PFQhit. A prefetch queue hit PFQhit may also be indicated simply by the code PFQhit. 【0033】 The counter value R-CNT is incremented by the prefetch queue management unit 40 when a PFQhit is detected. The counter value R-CNT indicates how many times a PFQhit has occurred. A larger value for the counter value R-CNT indicates that the predicted address P-ADRS has matched repeatedly, and that the prediction is more reliable. 【0034】 The common stride STRD for the stream indicates the change in address from the request address R-ADRS of the memory access request REQ to the prefetch destination address in memory 200. For example, each time a PFQhit is determined, the stride STRD is increased by the prefetch queue management unit 40 by the address difference between the beginning and end addresses of one cache line. However, after the counter value R-CNT reaches the sampling threshold STH (described later), the stride STRD is not increased even if a PFQhit is determined, and is maintained at its current value. Also, when a new entry ENT is registered, the stride STRD is set to its initial value (minimum stride), which is the address difference between the beginning and end addresses of one cache line. 【0035】 Figure 3 shows an example of the configuration of the stride setting unit 60 in Figure 1. The stride setting unit 60 includes a setting register 61, a distance generation unit 62, a selection unit 63, and a next stride control unit 64. The distance generation unit 62 includes an entry count sampling unit 621, a correction value generation unit 622, and a prefetch distance generation unit 623. The entry count sampling unit 621 includes an event counter EV-CNT. The next stride control unit 64 includes a stride conversion unit 641, a distance conversion unit 642, a distance comparison unit 643, and a next stride determination unit 644. 【0036】 The setting register 61 has a region for holding the sampling threshold STH, distance mode DMD, a 6-bit adjustment value ADJ, and a fixed distance F-DIST, and its value can be rewritten from outside the processor 100. The sampling threshold STH indicates the value of the event counter EV-CNT, which triggers the generation of the prefetch distance DIST, and is used by the entry count sampling unit 621. 【0037】 The distance mode DMD is used by the selection unit 63 to select either distance DIST or fixed distance F-DIST. The adjustment value ADJ is used to adjust the correction value CV when the correction value generation unit 622 of the distance generation unit 62 generates the correction value CV. The fixed distance F-DIST is used when the prefetch distance generation unit 623 of the distance generation unit 62 generates distance DIST, and is the maximum value of distance DIST. 【0038】 The entry count sampling unit 621 receives the number of valid entries VEN0, which indicates the number of valid entries ENT in the prefetch queue 50, and the event signal EV, which indicates the occurrence of an event that changes the number of valid entries ENT in the prefetch queue 50. Hereafter, the number of valid entries ENT will also be referred to as the number of valid entries. 【0039】 The event counter EV-CNT of the entry count sampling unit 621 counts each time an event signal EV is received. When the counter value of the event counter EV-CNT reaches the sampling threshold STH, the entry count sampling unit 621 stores the number of valid entries, which is the number of valid entries ENT0 at that time, and resets the event counter EV-CNT to 0. 【0040】 For example, events that change the number of valid entries include the registration or deletion of an entry ENT, and the event counter EV-CNT indicates the total number of these events. An entry ENT is deleted by the prefetch queue management unit 40 when no PFQhit has occurred for a certain period of time. Alternatively, an entry ENT is deleted by the prefetch queue management unit 40 when an entry ENT for a new stream is registered while all entries ENT in the prefetch queue 50 are valid. When an entry ENT for a new stream is registered while all entries ENT in the prefetch queue 50 are valid, one of the entries ENT with the smallest counter value R-CNT may be deleted. 【0041】 The entry count sampling unit 621 has, for example, two storage units (not shown) that store the number of valid entries. The two storage units alternately store the number of valid entries when the counter value of the event counter EV-CNT reaches the sampling threshold STH. The entry count sampling unit 621 calculates the average value of the current and previous number of valid entries stored in the two storage units and outputs the calculated average value as the number of valid entries VEN to the correction value generation unit 622. 【0042】 Furthermore, the number of valid entries for which the entry count sampling unit 621 calculates the average value is not limited to two, but may be three or more. Also, the entry count sampling unit 621 may output the number of valid entries VEN to the correction value generation unit 622 each time the counter value of the event counter EV-CNT reaches the sampling threshold STH. In this case, the entry count sampling unit 621 does not need to have a storage unit. 【0043】 The correction value generation unit 622 determines the correction value CV used to generate the distance DIST based on the number of valid entries VEN received from the entry number sampling unit 621 at each reset cycle of the event counter EV-CNT and the adjustment value ADJ held in the setting register 61. An example of the adjustment value ADJ and an example of how to determine the correction value CV are shown in Figure 6. 【0044】 The prefetch distance generation unit 623 calculates the distance DIST as an integer value based on the correction value CV generated by the correction value generation unit 622 and the fixed distance F-DIST held in the setting register 61. An example of how the distance DIST is calculated is shown in Figure 7. 【0045】 The selection unit 63 selects either the distance DIST from the prefetch distance generation unit 623 or the fixed distance F-DIST held in the setting register 61, according to the distance mode DMD held in the setting register 61. The selection unit 63 outputs the selected distance DIST or fixed distance F-DIST as the selected distance S-DIST to the next stride control unit 64. 【0046】 The stride conversion unit 641 of the next stride control unit 64 converts the selected distance S-DIST (integer value) received from the selection unit 63 into a selection stride S-STRD (amount of change in address). The distance conversion unit 642 converts the stride STRD held in the prefetch queue 50 into a comparison distance C-DIST (integer value) and outputs it to the distance comparison unit 643. The distance comparison unit 643 compares the distance C-DIST with the selected distance S-DIST and outputs the comparison result RSLT to the next stride determination unit 644. 【0047】 When the comparison result RSLT is C-DIST ≧ S-DIST, that is, when the stride STRD has reached the selection stride S-STRD, the next stride determination unit 644 outputs the selection stride S-STRD as the next stride N-STRD. The next stride N-STRD is stored as the stride STRD in the stride holding unit of the prefetch queue 50. 【0048】 When the comparison result is C-DIST < S-DIST, that is, when the stride STRD has not reached the selection stride S-STRD, the next stride determination unit 644 updates the stride STRD and outputs it as the next stride N-STRD. The update of the stride STRD is performed by adding the minimum stride, which is the address difference from the head address to the tail address of one cache line, to the current stride STRD. 【0049】 Figure 4 shows an example of prefetch operation by the prefetch control unit 30 in Figure 1. That is, Figure 4 shows an example of how the processor 100 controls prefetch operation. Since the prefetch queue 50 has multiple entries ENT, it can process multiple stream accesses, which are memory accesses by multiple memory access requests REQ with consecutive addresses, in parallel. Figure 4 shows the prefetch operation of one of the multiple streams. Although not shown in the figure, the selection distance S-DIST output from the selection unit 63 in Figure 3 is 3, and the selection distance S-DIST output by the stride conversion unit 641 in Figure 3 is 300. Therefore, the maximum value of the stride STRD is 300. 【0050】 In the example shown in Figure 4, one memory access request (REQ) reads data equal to the cache line size of the L1 cache 80, and consecutive memory access requests (REQ) are determined to be cache misses (L1-MIS). The multiple request addresses R-ADRS shown numerically in parentheses in consecutive memory access requests (REQ) represent multiple memory blocks of the cache line size without overlap. If a memory access request (REQ) results in a cache miss, the L1 cache control unit 20 in Figure 1 issues a data request (DREQ) not shown to the memory 200 in response to each memory access request (REQ). 【0051】 For the sake of simplicity, let's assume the cache line size is 100, and the request address R-ADRS in the first memory access request REQ is 1000. Let's assume that the request address R-ADRS in subsequent consecutive memory access requests REQs increases by 100 each time. 【0052】 The prefetch control unit 30 in Figure 1 monitors the request addresses R-ADRS included in the memory access request REQ. The prefetch control unit 30 detects stream access from the access trends of memory access requests REQ(1000)-REQ(1300). 【0053】 When the prefetch control unit 30 detects a stream access and issues a memory access request REQ(1400), it sets the stride STRD to 100 and issues a prefetch request PFREQ to address ADRS=1500, which is one cache line size away. The prefetch request PFREQ is shown as a U-shaped solid arrow. 【0054】 Furthermore, the prefetch control unit 30 issues a prefetch request PFREQ to address ADRS=1600, one cache line ahead, as indicated by a U-shaped dashed arrow, to prevent prefetching from being missed when the stride STRD is sequentially increased. As a result, data for two cache lines indicated by addresses ADRS=1500 and 1600 is prefetched in memory 200 (PF5(1), PF5(2)). Note that a stride STRD=100 corresponds to a prefetch distance DIST=1. 【0055】 Next, when a memory access request REQ(1500) is issued, the prefetch control unit 30 increases the stride STRD by 100 to 200 and issues a prefetch request PFREQ to address ADRS=1700, which is two cache line sizes away. The prefetch control unit 30 also issues a prefetch request PFREQ to address ADRS=1800, which is one cache line away. As a result, in memory 200, data for two cache lines indicated by addresses ADRS=1700 and 1800 is prefetched (PF6(1), PF6(2)). The stride STRD=200 corresponds to a prefetch distance DIST=2. 【0056】 Next, when a memory access request REQ(1600) is issued, the prefetch control unit 30 further increases the stride STRD by 100 to a maximum of 300 and issues a prefetch request PFREQ to address ADRS=1900, which is 3 cache line sizes away. As a result, data equivalent to one cache line indicated by address ADRS=1900 is prefetched in memory 200 (PF7). The stride STRD=300 corresponds to the prefetch distance DIST=3. 【0057】 In the example shown in Figure 4, the maximum value of the prefetch distance DIST is set to 3. Thereafter, as long as stream access continues, the prefetch control unit 30 repeatedly issues a prefetch request PFREQ to address ADRS, which is 3 cache line sizes away, with a stride STRD of 300. 【0058】 By issuing two prefetch requests PFREQ with a 100 address difference until the stride STRD reaches its maximum value (=300), it is possible to prevent prefetch misses during stream access. This prevents cache misses caused by prefetch misses and prevents a decrease in the processing performance of processor 100. 【0059】 The example shown in Figure 4 illustrates how to control prefetching with a maximum prefetch distance of 3 (maximum stride STRD = 300). The ideal prefetch distance is that immediately after the data PDT is read from memory 200 to the L1 cache 80 by prefetching, the memory access request REQ is processed and a cache hit occurs. For this reason, it is preferable that, for example, a prefetch request PFREQ issued based on the memory access request REQ (1600) stores the data in the L1 cache 80 immediately before the memory access request REQ (1900). 【0060】 However, if the prefetch distance is too short, a memory access request (REQ) for the data may be issued before the data is stored in the L1 cache 80 through prefetching, potentially resulting in a cache miss. In this case, the benefits of prefetching may not be realized, and the performance of the processor 100 may be reduced. 【0061】 Conversely, if the prefetch distance is too long and data is stored in the L1 cache 80 too quickly, necessary data may be evicted from the L1 cache 80, potentially resulting in a cache miss. In this case, the performance of the processor 100 may degrade. However, in this embodiment, as explained in Figures 9 and 10, the prefetch distance (i.e., stride STRD) is appropriately set according to the number of valid entries used in the stream. This reduces the frequency of cache misses and prevents a degradation in the processing performance of the processor 100. 【0062】 Figure 5 shows an example of the change in the state of entry ENT in the prefetch queue 50 when the operation shown in Figure 4 is performed. In other words, Figure 5 shows an example of how the processor 100 controls the prefetch operation. It should be assumed that no other stream access is performed before the operation shown in Figure 5 is started, and the stride holding unit does not hold a stride SRD. 【0063】 First, if memory access request REQ(1000) results in a cache miss, the prefetch queue management unit 40 searches for an available entry with valid flag VLD=0. The prefetch queue management unit 40 sets the valid flag VLD of the available entry to 1, making the entry ENT valid, and registers the entry ENT as a new entry in the prefetch queue 50. 【0064】 The prefetch queue management unit 40 sets the predicted address P-ADRS of entry ENT to an address (1100) that is the cache line size ahead of the memory access request REQ (1000). In addition, when a new entry ENT is registered, the prefetch queue management unit 40 resets the counter value R-CNT to 0 and sets the stride STRD to 100, which is the cache line size. 【0065】 Next, when a memory access request REQ(1100) is issued, the prefetch queue management unit 40 compares the address ADRS=1100 included in the memory access request REQ with the predicted address P-ADRS. Since the address ADRS matches the predicted address P-ADRS, the prefetch queue management unit 40 detects a PFQ hit and adds 100 to the predicted address P-ADRS, setting it to 1200. Also, because the prefetch queue management unit 40 detected a PFQ hit, it increments the counter PFQ-CNT by 1. 【0066】 Next, memory access requests REQ(1200) and REQ(1300) are issued in sequence. The prefetch queue management unit 40 operates in the same way as when memory access request REQ(1100) was issued, sequentially setting the predicted address P-ADRS to 1300 and 1400, and sequentially incrementing the counter PFQ-CNT to 2 and 3. 【0067】 Next, a memory access request REQ(1400) is issued. The prefetch queue management unit 40 sets the predicted address P-ADRS to 1500 and increments the counter PFQ-CNT to 4. Here, since the threshold for the counter PFQ-CNT is set to 4, the counter value R-CNT reaches the threshold. When the counter value R-CNT reaches the threshold, that is, when the number of matches between the requested address R-ADRS and the predicted address P-ADRS reaches the threshold, the prefetch queue management unit 40 starts issuing an activation instruction PFST using the stride STRD. The threshold for the counter value R-CNT that triggers the issuance of the activation instruction PFST is an example of the first threshold. 【0068】 By using the stride STRD to initiate the issuance of the start instruction PFST based on the counter value R-CNT reaching a threshold, it is possible to prevent prefetching from starting when it is not a stream access. As a result, it is possible to suppress the storage of data not used by processor 100 in the L1 cache 80, thereby suppressing a decrease in the utilization efficiency of the L1 cache 80. 【0069】 The prefetch queue management unit 40 adds 100 to the request address R-ADRS=1400 included in the memory access request REQ and issues an activation instruction PFST for prefetch request PFREQ(1500) to the prefetch request issuing unit 70. The prefetch queue management unit 40 increases the stride STRD by 100 to 200 because the stride STRD has not reached the maximum value of 300 indicated by the selected stride S-STRD. Also, if the stride STRD has not reached its maximum value, the prefetch queue management unit 40 issues an activation instruction PFST for prefetch request PFREQ(1600) to the prefetch request issuing unit 70 in order to prefetch one more cache line ahead. 【0070】 Next, a memory access request REQ(1500) is issued. The prefetch queue management unit 40 sets the predicted address P-ADRS to 1600 and increments the counter PFQ-CNT to 5. Since the counter value R-CNT exceeds the threshold = 4, the prefetch queue management unit 40 adds the stride STRD=200 to the request address R-ADRS=1500 included in the memory access request REQ. 【0071】 Then, the prefetch queue management unit 40 issues a prefetch request PFREQ(1700) activation instruction PFST to the prefetch request issuing unit 70. Also, because the stride STRD has not reached its maximum value of 300, the prefetch queue management unit 40 issues a prefetch request PFREQ(1800) activation instruction PFST to the prefetch request issuing unit 70 in order to prefetch one more cache line ahead. 【0072】 The prefetch queue management unit 40 increases the stride STRD by 100 to 300 because it has not reached its maximum value of 300. As a result, the stride STRD reaches its maximum value of 300, and in subsequent operations, the stride STRD will be maintained at 300 without increasing further. 【0073】 Next, a memory access request REQ(1600) is issued. The prefetch queue management unit 40 sets the predicted address P-ADRS to 1700 and increments the counter PFQ-CNT to 7. The prefetch queue management unit 40 adds the stride STRD=300 to the request address R-ADRS=1600 included in the memory access request REQ and issues a prefetch request activation instruction PFST for prefetch request PFREQ(1900) to the prefetch request issuing unit 70. 【0074】 Since the stride STRD is at its maximum value of 300, no prefetch is performed for the additional cache line. Thereafter, each time a memory access request REQ is issued via stream access, the prefetch queue management unit 40 issues a prefetch request PFREQ activation instruction PFST to the prefetch request issuing unit 70. At this time, the prefetch request PFREQ activation instruction PFST includes the request address R-ADRS included in the memory access request REQ plus the stride STRD=300. 【0075】 Figure 6 shows an example of how the correction value CV is generated by the correction value generation unit 622 in Figure 3. The correction value CV is generated based on the number of valid entries VEN and the values of each bit in the 6-bit adjustment value ADJ[5:0]. The number of valid entries VEN is associated with the bit positions of the adjustment value ADJ[5:0] in predetermined numbers and divided into 6 groups. Then, for each group of the number of valid entries VEN, one of two correction values CV is generated as the correction value CV according to the bit value of the adjustment value ADJ corresponding to the group. 【0076】 The correction value CV increases in proportion to the increase in the number of valid entries VEN, and the amount of increase in the correction value CV is set to be smaller than the amount of increase in the number of valid entries VEN. This suppresses the increase in the correction value CV associated with the increase in the number of valid entries VEN, and prevents the correction value CV from becoming too large in regions where the number of valid entries VEN is large. As a result, an appropriate prefetch distance DIST can be generated using an appropriate correction value CV. 【0077】 Furthermore, by using the adjustment value ADJ, the correction value CV can be fine-tuned, allowing the prefetch distance generation unit 623 to generate an appropriate prefetch distance DIST using the fine-tuned correction value CV. 【0078】 Figure 7 shows an example of how the prefetch distance DIST is generated by the prefetch distance generation unit 623 in Figure 3. The prefetch distance generation unit 623 generates the prefetch distance DIST based on the fixed distance F-DIST, which is a fixed prefetch distance set in the setting register 61, and the correction value CV generated by the correction value generation unit 622. 【0079】 For example, with respect to the requested address R-ADRS, the increment of the requested address included in the prefetch request PFREQ is at most the value obtained by multiplying the requested address R-ADRS included in the memory access request REQ issued by the instruction issuer 10 by the prefetch distance DIST. For example, if the cache line size CL is 64 bytes and the prefetch distance DIST is 3, the requested address included in the prefetch request PFREQ will be 64 × 3 bytes beyond the requested address R-ADRS included in the memory access request REQ. 【0080】 For example, Figure 7 may be created as a conversion table in which the prefetch distance DIST is listed for each of the multiple correction values CV. In this case, the prefetch distance generation unit 623 may use the conversion table to determine the prefetch distance DIST. By using the conversion table to determine the prefetch distance DIST, the prefetch distance DIST can be easily determined. Alternatively, the prefetch distance generation unit 623 may determine the prefetch distance DIST by dividing the fixed distance F-DIST by the correction value CV and rounding up the decimal part of the quotient. 【0081】 The prefetch distance DIST generated by the prefetch distance generation unit 623 becomes larger as the number of valid entries VEN is small and the correction value CV is small, and becomes smaller as the number of valid entries VEN is large and the correction value CV is large. The next stride control unit 64 then sets the selected stride S-STRD, which is the maximum value of the stride SRD, based on the prefetch distance DIST generated by the prefetch distance generation unit 623. 【0082】 This allows the prefetch distance DIST to be shortened to avoid evicting necessary data from the L1 cache 80 when the utilization of valid entries ENT is high. On the other hand, when the utilization of valid entries ENT is low, the prefetch distance DIST can be lengthened to issue a prefetch request PFREQ at an appropriate distance. 【0083】 Figure 8 shows an example of the operation of the prefetch queue management unit 40 in Figure 1. Specifically, Figure 8 shows an example of how the processor 100 controls the prefetch operation. First, in step S101, the prefetch queue management unit 40 receives the request address R-ADRS upon issuance of a memory access request REQ. Next, in step S102, the prefetch queue management unit 40 determines whether the L1 cache 80 has a cache miss or a cache hit based on the cache miss signal L1-MIS received from the L1 cache control unit 20. 【0084】 Although not shown in the operation flow of Figure 8, the L1 cache control unit 20 in Figure 1 performs the determination of a cache miss. When the L1 cache control unit 20 determines that a cache miss has occurred, it issues a data request DREQ (i.e., a memory access request to memory 200) to memory 200. 【0085】 In the event of a cache miss, in step S103, the prefetch queue management unit 40 determines whether a PFQhit has occurred. A PFQhit is determined when an entry ENT exists in the same stream as the requested address R-ADRS that caused the cache miss, and the requested address R-ADRS matches the predicted address P-ADRS. If a PFQhit occurs, the prefetch queue management unit 40 performs step S200; otherwise, it performs step S108. 【0086】 In the event of a cache hit, in step S104, the prefetch queue management unit 40 determines whether or not a PFQhit has occurred. If a PFQhit has occurred, the prefetch queue management unit 40 performs step S200; otherwise, it terminates the operation shown in Figure 8. 【0087】 In step S200, the prefetch queue management unit 40 instructs the stride setting unit 60 to generate the stride SRD, and then performs step S105. The generation of the stride SRD in step S200 is performed by the stride setting unit 60. An example of the operation of step S200 is shown in Figures 9 and 10. 【0088】 In step S105, the prefetch queue management unit 40 updates the prefetch queue 50 as described in Figure 5. Next, in step S106, the prefetch queue management unit 40 determines whether the conditions for issuing a prefetch request PFREQ are met based on the information held in the updated prefetch queue 50. If the conditions for issuing a prefetch request PFREQ are met, the prefetch queue management unit 40 performs step S107; otherwise, it terminates the operation shown in Figure 8. 【0089】 In step S107, the prefetch queue management unit 40 outputs a start command PFST to the prefetch request issuing unit 70 in order to issue a prefetch request PFREQ. The maximum value of the request address included in the prefetch request PFREQ that the prefetch request issuing unit 70 issues to the memory 200 is generated by adding the stride STRD to the request address R-ADRS received in step S101. 【0090】 In step S108, the prefetch queue management unit 40 determines whether or not there are any available entries in the prefetch queue 50. If there are available entries in the prefetch queue 50, the prefetch queue management unit 40 performs step S109. If there are no available entries in the prefetch queue 50, the prefetch queue management unit 40 terminates the operation shown in Figure 8. In step S109, the prefetch queue management unit 40 registers a new entry ENT and terminates the operation shown in Figure 8. 【0091】 Figure 9 shows an example of the operation of step S200 in Figure 8. First, in step S210, the stride setting unit 60 generates the prefetch distance DIST using the distance generation unit 62 in Figure 3. An example of the operation of step S210 is shown in Figure 10. 【0092】 Next, in step 220, the selection unit 63 in Figure 3 performs step S230 if the distance mode DMD indicates the selection of prefetch distance DIST, and performs step S240 if the distance mode DMD indicates fixed distance F-DIST. In step S230, the selection unit 63 selects the prefetch distance DIST generated by the distance generation unit 62, outputs it to the stride conversion unit 641 as the selected distance S-DIST, and performs the operation of step S250. The selected distance S-DIST is an integer that indicates how many blocks ahead the prefetch request PFREQ will be issued, with one cache line CL being considered as one block. 【0093】 In step S240, the selection unit 63 selects the fixed distance F-DIST set in the setting register 61, outputs it as the selected distance S-DIST to the stride conversion unit 641, and performs the operation in step S250. By outputting the fixed distance F-DIST as the selected distance S-DIST to the stride conversion unit 641, it is possible to set a constant selected stride S-STRD, which is the maximum value of the stride SRD, regardless of the number of streams. 【0094】 For example, if processor 100 executes many small programs in parallel, switching between them, and the number of streams is likely to change, the frequency of changes in the number of valid entries will increase. In this case, the frequency of generating the prefetch distance DIST will also increase, and it may become difficult to set an appropriate stride SRD in accordance with the change in the number of streams. In such cases, setting the selective stride S-STRD based on the fixed distance F-DIST can increase the likelihood of setting an appropriate stride SRD compared to when the frequency of generating the prefetch distance DIST is high. 【0095】 In step S250, the stride conversion unit 641 generates a selected stride S-STRD, which represents the maximum value of the address difference of the prefetch destination, using the integer value indicated by the selected distance S-DIST received from the selection unit 63. The stride conversion unit 641 outputs the generated selected stride S-STRD to the next stride determination unit 644. 【0096】 Next, in step S260, the next stride determination unit 644 compares the current stride SRD held in the prefetch queue 50 with the selected stride S-STRD generated by the stride conversion unit 641. If the current stride SRD is less than the selected stride S-STRD, the next stride determination unit 644 performs step S270. If the current stride SRD is greater than or equal to the selected stride S-STRD, the next stride determination unit 644 performs step S280. 【0097】 In step S270, the next stride determination unit 644 adds the address of one cache line to the current stride SRD and outputs it to the prefetch queue 50 as the next stride N-STRD, ending the operation shown in Figure 9. In step S280, the next stride determination unit 644 outputs the selected stride S-STRD to the prefetch queue 50 as the next stride N-STRD, ending the operation shown in Figure 9. 【0098】 Figure 10 shows an example of the operation of step S210 in Figure 9. The operation shown in Figure 10 is performed by the distance generation unit 62 in Figure 3. First, in step S211, the entry count sampling unit 621 determines whether the event counter EV-CNT has reached the sampling threshold STH. The sampling threshold STH is an example of a second threshold. If the event counter EV-CNT has reached the sampling threshold STH, the entry count sampling unit 621 performs step S212, and if the event counter EV-CNT has not reached the sampling threshold STH, it performs step S218. 【0099】 In step S212, the entry count sampling unit 621 stores the current number of valid entries in the prefetch queue 50. Next, in step S213, the entry count sampling unit 621 resets the event counter EV-CNT to "0". Next, in step S214, the entry count sampling unit 621 obtains the average value of the previously stored number of valid entries and the current number of valid entries. 【0100】 The number of valid entries may differ from the number of streams, which are multiple memory accesses with consecutive addresses. This is because, for example, there is a time lag between the start of multiple memory accesses with consecutive addresses and the registration of a new entry ENT in step S109 of Figure 8. Therefore, by using the average of the number of valid entries from the current and previous attempts, the discrepancy with the actual number of streams can be reduced, and the accuracy of generating the prefetch distance DIST can be improved. 【0101】 Furthermore, if the discrepancy between the number of valid entries and the number of streams can be ignored, the entry count sampling unit 621 may use the current number of valid entries as is, without obtaining the average value in step S214. In this case, the storage unit for storing the number of valid entries can be eliminated, and the process of calculating the prefetch distance DIST can be simplified. 【0102】 Thus, the entry count sampling unit 621 can indirectly determine the number of streams using a simple method by utilizing the number of valid entries, and can generate an appropriate prefetch distance DIST according to the number of streams. In contrast, if the number of valid entries is not used, it is necessary to estimate the number of streams by analyzing the request addresses R-ADRS included in all memory access requests, which increases the circuit size of the processor 100. 【0103】 In step S215, the entry count sampling unit 621 updates the number of valid entries VEN to be passed to the correction value generation unit 622. Next, in step S216, the correction value generation unit 622 generates a correction value CV using the number of valid entries VEN updated by the entry count sampling unit 621 and the adjustment value ADJ[5:0], as shown in Figure 6. Next, in step S217, the prefetch distance generation unit 623 calculates the prefetch distance DIST using the correction value CV generated by the correction value generation unit 622 and the fixed distance F-DIST, as shown in Figure 7, and completes the operation shown in Figure 10. 【0104】 By outputting the number of valid entries VEN to the correction value generation unit 622 using the sampling threshold STH set in the setting register 61, the generation frequency of the prefetch distance DIST can be changed from outside the processor 100. This allows the prefetch distance DIST to be generated more appropriately according to the characteristics of the program executed by the processor 100, and the stride STRD, which is the address interval of the prefetch request PFREQ, to be set more appropriately. 【0105】 On the other hand, in step S218, the entry count sampling unit 621 determines whether or not an event has occurred that changes the number of valid entries. If an event has occurred that changes the number of valid entries, the entry count sampling unit 621 performs step S219. If no event has occurred that changes the number of valid entries, the operation shown in Figure 10 is terminated. In step S219, the entry count sampling unit 621 increments the event counter EV-CNT by 1 and terminates the operation shown in Figure 10. 【0106】 In this embodiment, when the utilization rate of valid entries ENT is high and the frequency of issuing memory access requests REQ for each stream is low, the prefetch distance DIST can be shortened to make it more difficult to evict necessary data from the L1 cache 80. When the utilization rate of valid entries ENT is low and the frequency of issuing memory access requests REQ for each stream is high, the prefetch distance DIST can be lengthened to issue prefetch requests PFREQ at an appropriate distance. In other words, by dynamically changing the prefetch distance according to the number of valid entries ENT, the processing performance of the processor 100 can be improved. 【0107】 By using the number of valid entries VEN, the number of streams can be indirectly determined using a simple method, and an appropriate prefetch distance DIST can be generated according to the number of streams. 【0108】 By outputting the number of valid entries VEN to the correction value generation unit 622 using the sampling threshold STH set in the setting register 61, the generation frequency of the prefetch distance DIST can be changed from outside the processor 100. This allows the prefetch distance DIST to be generated more appropriately according to the characteristics of the program executed by the processor 100, and the stride STRD, which is the address interval of the prefetch request PFREQ, to be set more appropriately. 【0109】 By using the average of the number of valid entries from the current and previous sessions as the number of valid entries VEN, the discrepancy with the actual number of streams can be reduced, thereby improving the accuracy of generating the prefetch distance DIST. 【0110】 The correction value CV increases in proportion to the increase in the number of valid entries VEN, and the amount of increase in the correction value CV is set to be smaller than the amount of increase in the number of valid entries VEN. This suppresses the increase in the correction value CV associated with the increase in the number of valid entries VEN, and prevents the correction value CV from becoming too large in regions where the number of valid entries VEN is large. As a result, an appropriate prefetch distance DIST can be generated using an appropriate correction value CV. 【0111】 The prefetch distance DIST can be easily calculated by using a conversion table that lists the prefetch distance DIST for each of the multiple correction values CV. 【0112】 By selecting a fixed distance F-DIST using the selection unit 63 and outputting it as the selected distance S-DIST to the stride conversion unit 641, for example, a constant selected stride S-STRD, which is the maximum value of the stride SRD, can be set regardless of the number of streams. 【0113】 By using the stride STRD to initiate the issuance of the start instruction PFST based on the counter value R-CNT reaching a threshold, it is possible to prevent prefetching from starting when it is not a stream access. As a result, it is possible to suppress the storage of data not used by processor 100 in the L1 cache 80, thereby suppressing a decrease in the utilization efficiency of the L1 cache 80. 【0114】 By issuing two prefetch requests PFREQ with the requested address shifted by the cache line size until the stride STRD reaches its maximum value, it is possible to prevent prefetch misses during stream access. This prevents cache misses caused by prefetch misses and suppresses a decrease in the processing performance of processor 100. 【0115】 The features and advantages of the embodiments will become clear from the detailed description above. This is intended to be so as not to deviate from the spirit and scope of the claims, that the features and advantages of the embodiments described above are included. Furthermore, any improvement and modification should be readily conceivable to a person with ordinary skill in the art. Therefore, there is no intention to limit the scope of inventive embodiments to those described above, and it is also possible to rely on appropriate improvements and equivalents that fall within the scope disclosed in the embodiments. [Explanation of Symbols] 【0116】 10. Order Issuing Unit 20 L1 Cache Control Unit 30 Prefetch control unit 40 Prefetch Queue Management Department 50 Prefetch Queue 60 Stride setting section 61 Configuration Registers 62 Distance generator 63 Selection Section 64 Next Stride Control Unit 70 Prefetch Request Issuance Unit 100 processors 200 memory 300 Information Processing Devices 621 Entry Count Sampling Section 622 Correction Value Generation Unit 623 Prefetch distance generation unit 641 Stride conversion section 642 Distance conversion unit 643 Distance Comparison Section 644 Next Stride Determination Section ADJ adjustment value CL Cashline CV correction value DIST Prefetch Distance DMD Distance Mode DREQ Data Request DT Data ENT entry EV event signal L1-HIT Cache Hit L1-MIS Cache Miss PDT data PFQhit Prefetch Cue Hit PFREQ prefetch request PFST startup instruction R-CNT counter value REQ (Memory Access Request) RSLT comparison results STH sampling threshold STRD Stride VEN Number of valid entries VLD Valid Flag
Claims
[Claim 1] A cache that holds data read from memory in response to memory access requests, A prefetch queue has multiple entries, each of which is assigned to a stream of multiple memory access requests with consecutive addresses, and each of the multiple entries is used to control the prefetching of data from memory to the cache for each stream. A stride setting unit adjusts the stride, which is the amount of change between the access address included in the memory access request and the prefetch destination address, according to the number of valid entries to which each of the streams is assigned, and reduces the stride as the number of valid entries increases. For each stream, after the number of memory access requests with consecutive addresses reaches a preset first threshold, a prefetch management unit issues a prefetch request to the memory using the stride adjusted by the stride setting unit for each memory access request with consecutive addresses. Processor. [Claim 2] The stride setting unit determines the stride based on the number of valid entries when the sum of the number of newly assigned entries and the number of unassigned entries reaches a preset second threshold. The processor according to claim 1. [Claim 3] The stride setting unit is, Each time the sum reaches the pre-set second threshold, the sum is reset to zero, and the stride is calculated based on the average number of valid entries each time the sum reaches the second threshold multiple times. The processor according to claim 2. [Claim 4] The stride setting unit is, A correction value generation unit that generates a correction value corresponding to the number of valid entries, A prefetch distance generation unit generates a prefetch distance that indicates the number of units of the set stride when the minimum stride is considered to be 1 unit, based on the correction value, The system includes a stride conversion unit that converts the stride used in the prefetch request from the prefetch distance, The correction value generation unit increases the correction value in accordance with the increase in the number of valid entries, and sets the amount of increase in the correction value to be smaller than the amount of increase in the number of valid entries. The processor according to claim 2. [Claim 5] The prefetch distance generation unit has a conversion table in which the prefetch distance is listed for each of the multiple correction values, and it determines the prefetch distance corresponding to the correction value by referring to the conversion table. The processor according to claim 4. [Claim 6] The stride setting unit has a selection unit that selects either the prefetch distance generated by the prefetch distance generation unit or a fixed prefetch distance and outputs it to the stride conversion unit. The stride conversion unit converts the prefetch distance output from the selection unit into the stride. The processor according to claim 5. [Claim 7] The aforementioned prefetch queue is A prediction value holding unit that holds a predicted value of the access address included in the next memory access request, The system includes a match count holding unit that holds the number of times the access address included in the memory access request matches the predicted value, The prefetch management unit issues the prefetch request each time an access address matches the predicted value, after the number of times the predicted value matches the first threshold. The processor according to any one of claims 1 to 6. [Claim 8] The stride set by the stride setting unit according to the number of valid entries is the maximum stride used for the prefetch request. The stride setting unit, after the number of memory access requests with consecutive addresses reaches the first threshold, sequentially increases the stride for each memory access request with consecutive addresses until the stride reaches the maximum value. Multiple prefetch requests are issued for each memory access request until the stride reaches the maximum value. The processor according to claim 1. [Claim 9] An information processing apparatus having a processor and a memory for storing data used by the processor, The aforementioned processor, A cache that holds data read from memory in response to memory access requests, A prefetch queue has multiple entries, each of which is assigned to a stream of multiple memory access requests with consecutive addresses, and each of the multiple entries is used to control the prefetching of data from memory to the cache for each stream. A stride setting unit adjusts the stride, which is the amount of change between the access address included in the memory access request and the prefetch destination address, according to the number of valid entries to which each of the streams is assigned, and reduces the stride as the number of valid entries increases. For each stream, after the number of memory access requests with consecutive addresses reaches a preset first threshold, a prefetch management unit issues a prefetch request to the memory with a stride adjusted by the stride setting unit for each memory access request with consecutive addresses. Information processing device. [Claim 10] A cache that holds data read from memory in response to memory access requests, A processor control method having a prefetch queue having a plurality of entries assigned to each of a stream which is a plurality of memory access requests with consecutive addresses, each of which is used to control the prefetching of data from memory to the cache for each of the plurality of entries, The stride setting unit of the processor adjusts the stride, which is the amount of change between the access address included in the memory access request and the prefetch destination address, according to the number of valid entries to which each stream is assigned, and the more valid entries there are, the smaller the stride becomes. The prefetch management unit of the processor, for each stream, after the number of memory access requests with consecutive addresses reaches a preset first threshold, issues a prefetch request to the memory with the stride adjusted by the stride setting unit for each memory access request with consecutive addresses. A method for controlling the processor.