Data processing method and device, electronic equipment and storage medium

CN122195876APending Publication Date: 2026-06-12SHANDONG BOSUAN ZHIXIN INFORMATION TECHNOLOGY CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
SHANDONG BOSUAN ZHIXIN INFORMATION TECHNOLOGY CO LTD
Filing Date
2026-02-28
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Existing data prefetching technologies require a large amount of storage space in hardware prefetching to record spatiotemporal sequence information. In actual chip designs, sufficient storage space cannot be provided, which hinders the further development of prefetching technology.

Method used

By constructing time pattern tables and spatiotemporal pattern tables based on data processing requests from multiple processor cores, the prefetching mode is determined using program counter identifiers and process identifiers, and the index bits are dynamically adjusted by combining spatial base address and confidence level to reduce storage space requirements.

Benefits of technology

It effectively reduces the storage space requirements for spatiotemporal sequence information, improves the accuracy and efficiency of data prefetching, adapts to complex memory access patterns, and enhances processor performance.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122195876A_ABST
    Figure CN122195876A_ABST
Patent Text Reader

Abstract

The present disclosure provides a data processing method and device, electronic equipment and storage medium, and relates to the technical field of data storage. The method comprises the following steps: determining a time mode table and a space-time mode table based on a plurality of data processing requests sent by a plurality of processor cores; determining a prefetch mode corresponding to each data processing request based on a program counter identifier and at least one lookup entry; and determining at least one prefetch address corresponding to each data processing request from the time mode table or the space-time mode table based on at least one of the prefetch mode corresponding to each data processing request, the program counter identifier, a training address and a prefetch degree. In this way, the space-time sequence information that may occur repeatedly can be stored in the form of a space base address identifier, a space bit vector, a confidence degree and a reuse number, thereby greatly reducing the storage space of the space-time sequence information and helping the development of the prefetch technology.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates to the field of data storage technology, and in particular to a data processing method, apparatus, electronic device and storage medium. Background Technology

[0002] Data prefetching is a technique based on the principle of locality of reference. Its core idea is to leverage the locality of data access characteristics, namely, the tendency of programs to repeatedly access certain data (temporal locality) and access adjacent data (spatial locality). By predicting the data the program might access in the future and loading it into the cache, data prefetching can be implemented in three ways: hardware prefetching, software prefetching, and a combination of hardware and software prefetching. Hardware prefetching is a prefetching method performed automatically by hardware. The processor predicts the data that might be needed later based on the current data access pattern and automatically loads it into the cache. For example, when consecutive accesses to a memory region are detected, adjacent data blocks are automatically prefetched. This method requires no software intervention, automatically adapts to the program's memory access pattern, and improves the efficiency and accuracy of data prefetching.

[0003] The prefetching process requires a large amount of storage space to record all possible repetitive spatiotemporal sequence information, but actual chip design cannot provide such a large amount of storage space, which hinders the further development of prefetching technology. Summary of the Invention

[0004] This disclosure provides a data processing method, apparatus, electronic device, and storage medium to at least solve the above-mentioned technical problems existing in the prior art.

[0005] According to a first aspect of this disclosure, a data processing method is provided, comprising:

[0006] Determine the time pattern table and the spatiotemporal pattern table based on multiple data processing requests sent from multiple processor cores; Based on the type of processor core, the threads corresponding to at least one data processing request sent by multiple processor cores are merged to obtain at least one process; Based on the process identifier, at least one lookup entry is determined in the spatiotemporal index table, and based on the program counter identifier and the at least one lookup entry, the prefetch mode corresponding to each data processing request is determined; Based on at least one of the prefetch mode, program counter identifier, training address, and prefetch degree corresponding to each data processing request, determine at least one prefetch address corresponding to each data processing request from the time pattern table or the spatiotemporal pattern table. The data processing request includes a data missing request or a data prefetch request, and the prefetch degree includes the number of prefetch addresses corresponding to each training address; the time pattern table includes a program counter identifier, a training address identifier, a prediction address identifier, a reuse count, and a confidence level; the spatiotemporal pattern table includes a program counter identifier, a training address identifier, a spatial base address identifier, a spatial bit vector, a confidence level, and a reuse count; the spatial bit vector is used to represent the identifier of the repeatedly accessed spatial sub-address in the space corresponding to the spatial base address.

[0007] According to a second aspect of this disclosure, a data processing apparatus is provided, the apparatus comprising: The storage unit is used to determine the time pattern table and the spatiotemporal pattern table based on multiple data processing requests sent from multiple processor cores; The monitoring and adjustment unit is used to merge the threads corresponding to at least one data processing request sent by multiple processor cores based on the type of processor core, so as to obtain at least one process; The spatiotemporal stream index unit is used to determine at least one lookup entry in the spatiotemporal index table based on the process identifier, and to determine the prefetching mode corresponding to each data processing request based on the program counter identifier and the at least one lookup entry. The determining unit is used to determine at least one prefetch address corresponding to each data processing request from a time pattern table or a spatiotemporal pattern table based on at least one of the prefetch pattern, program counter identifier, training address and prefetch degree corresponding to each data processing request. The data processing request includes a data missing request or a data prefetch request, and the prefetch degree includes the number of prefetch addresses corresponding to each training address; the time pattern table includes a program counter identifier, a training address identifier, a prediction address identifier, a reuse count, and a confidence level; the spatiotemporal pattern table includes a program counter identifier, a training address identifier, a spatial base address identifier, a spatial bit vector, a confidence level, and a reuse count; the spatial bit vector is used to represent the identifier of the repeatedly accessed spatial sub-address in the space corresponding to the spatial base address.

[0008] According to a third aspect of this disclosure, an electronic device is provided, comprising: At least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the methods of this disclosure.

[0009] According to a fourth aspect of this disclosure, a non-transitory computer-readable storage medium is provided storing computer instructions for causing the computer to perform the methods described in this disclosure.

[0010] The data processing method disclosed herein determines a time pattern table and a spatiotemporal pattern table based on multiple data processing requests sent by multiple processor cores; merges the threads corresponding to at least one data processing request sent by multiple processor cores based on the type of processor core to obtain at least one process; determines at least one lookup entry in the spatiotemporal index table based on the process identifier, and determines the prefetch pattern corresponding to each data processing request based on the program counter identifier and the at least one lookup entry; determines at least one prefetch address corresponding to each data processing request from the time pattern table or the spatiotemporal pattern table based on at least one of the prefetch pattern corresponding to each data processing request, the program counter identifier, the training address, and the prefetch degree; wherein, the data processing request includes a data missing request or a data prefetch request, and the prefetch degree includes the number of prefetch addresses corresponding to each training address; the time pattern table includes a program counter identifier, a training address identifier, a prediction address identifier, a reuse count, and a confidence level; the spatiotemporal pattern table includes a program counter identifier, a training address identifier, a spatial base address identifier, a spatial bit vector, a confidence level, and a reuse count; the spatial bit vector is used to represent the identifier of the repeatedly accessed spatial sub-address in the space corresponding to the spatial base address. In this way, potentially duplicated spatiotemporal sequence information can be stored in the form of spatial base address identifiers, spatial bit vectors, confidence levels, and reuse counts, greatly reducing the storage space of spatiotemporal sequence information and helping to further develop prefetching technology.

[0011] It should be understood that the description in this section is not intended to identify key or essential features of the embodiments of this disclosure, nor is it intended to limit the scope of this disclosure. Other features of this disclosure will become readily apparent from the following description. Attached Figure Description

[0012] The above and other objects, features, and advantages of this disclosure will become readily apparent from the following detailed description of exemplary embodiments, taken in conjunction with the accompanying drawings. Several embodiments of this disclosure are illustrated in the drawings by way of example and not limitation, in which: In the accompanying drawings, the same or corresponding reference numerals indicate the same or corresponding parts.

[0013] Figure 1 This diagram illustrates data prefetching classification in related technologies. Figure 2 A schematic diagram of a first optional flow of the data processing method provided in this embodiment of the present disclosure is shown; Figure 3 A schematic diagram of a first optional structure of the data processing apparatus provided in this disclosure embodiment is shown; Figure 4 A schematic diagram of a second optional flow of the data processing method provided in an embodiment of this disclosure is shown; Figure 5A schematic diagram of the thread partition table and backfill request tracking table provided in an embodiment of this disclosure is shown; Figure 6 A schematic diagram of a third optional flow of the data processing method provided in this embodiment of the present disclosure is shown; Figure 7 A schematic diagram of the current table and the completed table of the spatiotemporal flow index provided in an embodiment of this disclosure is shown; Figure 8 A schematic diagram of a fourth optional flow of the data processing method provided in this disclosure embodiment is shown; Figure 9 A schematic diagram of a historical sample table provided in an embodiment of this disclosure is shown; Figure 10 This illustration shows a diagram of the previous address, nearest address, spatial bit vector, and stream length provided in an embodiment of this disclosure; Figure 11 A schematic diagram of the time pattern table and spatiotemporal pattern table provided in the embodiments of this disclosure is shown; Figure 12 A schematic diagram of the principle of the size adjustment unit provided in the embodiment of this disclosure is shown; Figure 13 A schematic diagram of a second optional structure of the data processing apparatus provided in an embodiment of this disclosure is shown; Figure 14 A schematic diagram of the composition structure of an electronic device according to an embodiment of the present disclosure is shown. Detailed Implementation

[0014] To make the objectives, features, and advantages of this disclosure more apparent and understandable, the technical solutions in the embodiments of this disclosure will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of this disclosure, and not all of them. All other embodiments obtained by those skilled in the art based on the embodiments of this disclosure without creative effort are within the scope of protection of this disclosure.

[0015] In the following description, references are made to “some embodiments,” which describe a subset of all possible embodiments. However, it is understood that “some embodiments” may be the same subset or different subsets of all possible embodiments and may be combined with each other without conflict.

[0016] In the following description, the terms "first" and "second" are used merely to distinguish similar objects and do not represent a specific ordering of objects. It is understood that "first" and "second" may be interchanged in a specific order or sequence where permitted, so that the embodiments of this disclosure described herein can be implemented in an order other than that illustrated or described herein.

[0017] Unless otherwise defined, all technical and scientific terms used in this disclosure have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. The terminology used in this disclosure is for the purpose of describing embodiments of this disclosure only and is not intended to be limiting of this disclosure.

[0018] It should be understood that in the various embodiments of this disclosure, the sequence number of each implementation process does not imply the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of this disclosure.

[0019] Before providing a further detailed description of the embodiments of this disclosure, the nouns and terms involved in the embodiments of this disclosure will be explained, and the nouns and terms involved in the embodiments of this disclosure shall be interpreted as follows.

[0020] 1) Cache refers to a small static random-access memory (SRAM) added between the central processing unit (CPU) and main memory to improve memory access speed and store frequently used data.

[0021] 2) The Miss Status Holding Register (MSHR) is used to store cache miss request information.

[0022] 3) Last Level Cache (LLC): In a multi-level caching system, the cache level furthest from the core generally has the largest capacity but the slowest speed.

[0023] 4) Cache pollution. If prefetched data is retrieved too early or is not needed for future memory access, it will occupy a storage location in the cache for a period of time and may replace the originally valid data, causing cache pollution.

[0024] 5) Accuracy. Prefetch accuracy is the proportion of cache lines that are actually accessed in the prefetched cache. Prefetch accuracy = (number of cache misses without a prefetcher - number of cache misses with a prefetcher) / total number of prefetches issued by the prefetcher. A high accuracy indicates that the prefetching technology can reduce the pressure on storage bandwidth and reduce cache pollution.

[0025] 6) Coverage. The proportion of cache misses that occur without a prefetcher that are covered after a prefetcher is added. Prefetch coverage = (Number of cache misses without a prefetcher - Number of cache misses with a prefetcher) / Number of cache misses without a prefetcher.

[0026] 7) Timeliness. Does the data provided by the prefetcher arrive before it's needed, but not too early to avoid it being discarded before use? This metric determines whether cache miss latency can be completely hidden. Prefetching too early may cause cache pollution or the prefetched data may be replaced prematurely, while prefetching too late will not improve performance.

[0027] 8) Storage overhead. Hardware prefetchers require additional hardware storage units in the processor to record historical memory access information. The memory access patterns found also need to be recorded in additional units and trigger the prefetcher, thus incurring additional storage overhead.

[0028] 10) Prefetch degree. The number of prefetch requests issued by the prefetch unit based on the characteristics of the memory access sequence obtained.

[0029] The rapid development of modern processor computing units and the slow development of memory speeds create a significant gap, causing processors to spend considerable time waiting for data to return from memory, a phenomenon known as the "memory wall" problem. Initially, to address this problem, researchers established temporary memory (cache) between the processor and main memory to store recently or frequently accessed data and instructions, reducing the time the processor spends accessing main memory. Furthermore, by increasing cache capacity, implementing multi-level caching, and updating replacement algorithms, they ensured that more effective and valuable data was retained in the cache, significantly alleviating the "memory wall" problem.

[0030] However, as cache capacity increases further, some memory access misses cannot be addressed using the methods described above. Multi-level caching primarily relies on temporal and spatial locality to improve hit rates, but for complex access patterns such as non-contiguous memory accesses and pointer tracing, multi-level caching may fail to effectively predict and cover these accesses, leading to cache misses. This is where data prefetching techniques come in to further improve cache hit rates and alleviate the performance bottleneck caused by the "memory wall" problem.

[0031] Data prefetching is a technique based on the principle of locality of reference. Its core idea is to leverage the locality of data access characteristics, namely, the tendency of programs to repeatedly access certain data (temporal locality) and access adjacent data (spatial locality). By predicting the data the program might access in the future and loading it into the cache, data prefetching can be implemented in three ways: hardware prefetching, software prefetching, and a combination of hardware and software prefetching. Hardware prefetching is a prefetching method performed automatically by hardware. The processor predicts the data that might be needed later based on the current data access pattern and automatically loads it into the cache. For example, when consecutive accesses to a memory region are detected, adjacent data blocks are automatically prefetched. This method requires no software intervention, automatically adapts to the program's memory access pattern, and improves the efficiency and accuracy of data prefetching.

[0032] Figure 1 A schematic diagram of data prefetching classification in related technologies is shown.

[0033] Hardware prefetching technology can be broadly categorized into two types: rule-based access mode and non-rule-based access mode, such as... Figure 1 The diagram illustrates regular and irregular access patterns. Regular access patterns can be categorized into two types: stream prefetching and stride prefetching. Stream prefetching is used when the memory access sequence has sequential intervals. In this case, only the address following the missing memory access address needs to be fetched into the cache, and it is commonly used for instruction stream prefetching. Stride prefetching is used when there are fixed intervals between memory access address sequences. By tracking the intervals between previous addresses, the size of the future fixed interval is guessed. When a memory access address is missing, the missing address is added to the memory access interval to obtain the prefetched address, which is then placed into the cache. Both of these prefetchers are currently widely used in various high-performance processors. However, these prefetchers cannot effectively capture memory access patterns for complex access patterns, and they cannot effectively mask cache misses for L2 and L3 level caches. Therefore, prefetching algorithms designed for irregular sequences have also been proposed.

[0034] The identification of patterns in irregular memory access sequences is based on the temporal and spatial correlations of the program. For example... Figure 1 Time prefetching mode in China-Africa rule-based access patterns ( Figure 1As shown in (c), this is a schematic diagram of time-dependent prefetching. Temporal locality arises because applications tend to access data in a repetitive manner, making access sequences likely to repeat. Therefore, missed sequences may repeat, and new sequences are more likely to repeat than old ones. Time-dependent prefetchers predict recurring missed sequences based on the principle of time correlation. They typically process the memory access stream and record a series of memory accesses; the processed data is metadata. By learning the temporal relationship between specific address accesses, they describe the processor's memory access patterns. When a prefetch trigger event occurs, the corresponding metadata information is queried. Prefetching is performed when a pattern repeats, increasing the reliability of the memory access stream. If not found, the metadata is recorded. It can be seen that the memory access sequence AXYG reappears after a period of time. Under certain conditions, AXYG can be recorded, and prefetching is performed when a similar sequence appears (e.g., if AX is seen again, YG is retrieved in advance).

[0035] like Figure 1 Spatial prefetching mode in China-Africa rule-based access patterns ( Figure 1 As shown in (d), this is a schematic diagram of a spatially dependent prefetcher. Spatial dependency refers to the phenomenon that memory accesses occur in recurring spatial patterns. A spatially dependent prefetcher is a mechanism that predicts future memory accesses based on spatial address dependencies. The basic idea is to divide the memory space into fixed-size regions and apply learning to these divisions. Once the spatial dependency pattern is learned, it can be used to predict and prefetch future memory accesses when the application uses a circular memory region. That is, if a program accesses the [x,y,z] position of the same page A, it is very likely to access the [x,y,z] positions of the same or similar pages in the future. For example, in the diagram, if the program accesses the addresses [0,2,6,3] at several offsets in page A, and then accesses the addresses [0,2,6,3] at several offsets in page B after a certain interval, then [6,3] can be prefetched into the cache when [0,2] is accessed in page B to avoid cache misses.

[0036] The effectiveness of temporal and spatial prefetching varies depending on the application. Pointer-tracking applications generate long chains of dependency cache misses that cannot be effectively captured by spatial prefetching but can be captured by temporal prefetching. Scan-dominated applications generate a large number of mandatory cache misses that can be captured by spatial prefetchers but cannot be predicted by temporal prefetchers. Combined temporal and spatial prefetching schemes attempt to combine the advantages of both approaches, making them suitable for a wider range of applications. Figure 1 Spatiotemporal prefetching mode in non-rule-based access patterns ( Figure 1As shown in (e), the offset address (space) at position [2,6,7] is recorded on page A, then the interval between them is recorded, and the prefetch of B after A is recorded (time). The complete memory access flow is obtained by recording time and space simultaneously, and the prefetch is issued according to the pattern when the prefetch condition is triggered.

[0037] In related technologies, spatiotemporal stream prefetchers require a large amount of storage space to record all potentially repeating spatiotemporal sequence information. Actual chip designs cannot provide such a large storage space, hindering further development. Feasibility needs to be improved by compressing storage space and filtering stored spatiotemporal sequences. Irregular prefetchers are typically quite aggressive; too many prefetch requests can lead to cache pollution (low accuracy), while too few requests cannot meet the requirement of masking cache misses. Therefore, an effective prefetching mechanism is needed to improve overall performance. Static index storage tables cannot achieve uniform distribution; in some cases, a small portion of the storage table may be frequently replaced while other areas remain unchanged. Dynamic indexing needs to be introduced into the data prefetching unit to improve distribution uniformity. In current high-performance processors, Intel has proposed the concept of thread partitioning, requiring special optimizations in the data prefetching unit to better adapt to this situation. The spatiotemporal stream prefetching unit needs to store metadata to obtain memory access patterns; the storage location and the size of the metadata table need to be dynamically adjusted to adapt to different situations.

[0038] In view of the deficiencies of related technologies in data prefetching, this disclosure provides a data prefetching method to at least solve some or all of the above-mentioned technical problems.

[0039] Figure 2 A schematic diagram of a first alternative flow of the data processing method provided in this disclosure is shown, and the steps will be described accordingly.

[0040] Step S201: Determine the time pattern table and the spatiotemporal pattern table based on the multiple data processing requests sent by multiple processor cores.

[0041] In some embodiments, the time pattern table includes a program counter identifier, a training address identifier, a prediction address identifier, a reuse count, and a confidence level; the spatiotemporal pattern table includes a program counter identifier, a training address identifier, a spatial base address identifier, a spatial bit vector, a confidence level, and a reuse count; the spatial bit vector is used to represent the identifier of the repeatedly accessed spatial sub-address in the space corresponding to the spatial base address. Optionally, the time pattern table and the spatiotemporal pattern table can be stored in the last-level cache; and / or, entries used within a first time interval in the time pattern table are stored outside the last-level cache in the form of a temporary time record table, and entries used within a first time interval in the spatiotemporal pattern table are stored outside the last-level cache in the form of a temporary spatiotemporal record table. The first time interval may include a recent time interval or a specified time interval.

[0042] In some embodiments, the time pattern table includes records of multiple data processing requests sent by multiple processor cores accessing memory in a time pattern within a certain time interval; specifically, it includes a program counter identifier, training address identifier, predicted address identifier, reuse count, and confidence level for each data processing request. The confidence level characterizes the reliability of the entry, i.e., the reliability of the predicted address obtained by the data processing request according to the entry, or the probability or degree that the entry is reliable, trustworthy, and accurate; it can be used to characterize whether the data is genuine, whether the measurement / collection / calculation results are reliable, and whether it can be safely stored and used.

[0043] In some embodiments, the carrier implementing the data processing method (hereinafter referred to as the carrier) records multiple data processing requests sent by multiple processor cores, and determines a time pattern table and a spatiotemporal pattern table based on the processor core identifier, PC identifier, process identifier, program counter identifier and training address corresponding to each data processing request, as well as the occurrence pattern of the training address corresponding to each data processing request.

[0044] In some embodiments, the data processing request may include a data missing request or a data prefetch request; the data missing request includes a data retrieval request issued by the processor core (core) indicating that the data was not found at the current level and needs to be retrieved at the next level.

[0045] The carrier can be computer programs, electronic circuits, databases, mobile applications, electronic devices, cloud computing platforms, distributed systems, artificial intelligence frameworks, mathematical models, automation tools, and microcontrollers, etc., which are software or hardware capable of implementing algorithms and methods.

[0046] Step S202: Based on the type of processor core, merge the threads corresponding to at least one data processing request sent by multiple processor cores to obtain at least one process.

[0047] In some embodiments, a thread corresponds to a unique PC identifier at any given time, and each PC identifier is executed once, corresponding to a data processing request. Since threads are continuous in time, it means that when a thread runs for a period of time, each moment will correspond to a different PC identifier, and thus correspond to multiple data processing requests.

[0048] In some embodiments, the carrier merges threads with the same processor core type and the same process identifier into one process to obtain at least one process; wherein, one data processing request corresponds to one thread, and one thread includes at least one data processing request.

[0049] In practical implementation, the processor core type can be determined based on its division of labor or function, such as computation type, input / output (I / O) type, or hybrid type. The carrier merges multiple threads with the same processor core type and process identifier into one process. Furthermore, multiple data processing requests corresponding to multiple threads can be integrated into multiple data processing requests corresponding to one process, improving data processing efficiency.

[0050] Furthermore, during subsequent updates to the current table of time indexes, the completed table of time indexes, the time pattern table, the spatiotemporal pattern table, and the historical sample table, multiple data processing requests can be maintained based on processes, allowing one or more entries in the table to correspond to the same process. This improves the accuracy and efficiency of table updates, while also enhancing data processing efficiency.

[0051] Step S203: Determine at least one lookup entry in the spatiotemporal index table based on the process identifier, and determine the prefetch mode corresponding to each data processing request based on the program counter identifier and the at least one lookup entry.

[0052] In some embodiments, the carrier is indexed in the spatiotemporal index table based on the index pattern and process identifier.

[0053] In some embodiments, the carrier determines the indexing mode; the indexing mode includes the index used during indexing, for example, the process identifier is an n-bit character. If all characters are used for indexing, the indexing computation will increase. However, if characters with weak identifiers in the process identifier are used for indexing, the entries corresponding to different training addresses in the table will not be evenly distributed. For example, if three process identifiers are 0x0001, 0x0002, and 0x0003, if only the first two bits (00) are used as index bits, they will all be mapped to the entry of "00", causing the entry to be frequently replaced, while other entries (such as 01, 10, 11) are idle, resulting in "uneven distribution".

[0054] In some embodiments, the carrier determines the percentage of different index bits between training addresses corresponding to at least two data processing requests in the same process; in response to the percentage of different index bits being greater than or equal to an index threshold, the indexing pattern is determined to be at least one bit with different index bits; the indexing pattern includes the index bits used during indexing. Specifically, at least two data processing requests in the same process correspond to different training addresses, and the indexing pattern is determined based on the training addresses to ensure a uniform distribution of entries.

[0055] In specific implementation, if the proportion of different index bits is greater than or equal to the index threshold, the index confidence of the process is increased; or, if the proportion of different index bits is less than the index threshold, the index confidence of the process is decreased. Furthermore, when the index confidence of a process is less than the mode switching threshold, the index mode is changed. To ensure a uniform distribution of training addresses, all entries are grouped and evenly allocated according to the number of memory access cores and the total number of threads, to avoid all resources being heavily occupied by a single thread or core, while other threads or cores are "starved".

[0056] In some optional embodiments, the carrier may also periodically determine the index confidence level corresponding to each process; in response to the index confidence level of any process being less than the mode switching threshold, the index mode corresponding to the process is updated; or, in response to the index confidence level of any process being greater than or equal to the mode switching threshold, the index mode corresponding to the process is not changed.

[0057] In some embodiments, the spatiotemporal index table includes a current spatiotemporal index table and a completed spatiotemporal index table.

[0058] In some embodiments, the carrier determines at least one lookup entry in the current spatiotemporal index table based on the process identifier; searches the at least one lookup entry based on the program counter identifier and index pattern corresponding to any data processing request to determine whether there is a first target entry matching the program counter of the data processing request; if there is a first target entry matching the program counter of the data processing request, the reuse confidence and pattern confidence corresponding to any data processing request are determined based on the first target entry, and the prefetch pattern corresponding to any data processing request is determined based on the reuse confidence and pattern confidence; wherein, the current spatiotemporal index table includes the program counter identifier, the previous address, the spatial base address, the most recent offset, the number of unused times, the reuse confidence, and the pattern confidence.

[0059] In other embodiments, if no first target entry matching the program counter of the data processing request exists in the current spatiotemporal index table, at least one lookup entry is determined in the spatiotemporal index completion table based on the process identifier; a search is performed in the at least one lookup entry based on the program counter identifier and index pattern corresponding to any data processing request to determine whether a matching first target entry exists; if a first target entry matching the program counter of the data processing request exists in the at least one lookup entry, the reuse confidence and pattern confidence corresponding to any data processing request are determined based on the first target entry, and the prefetch pattern corresponding to any data processing request is determined based on the reuse confidence and pattern confidence; wherein, the spatiotemporal index completion table includes the PC index, the number of unused times, the reuse confidence, and the pattern confidence.

[0060] In some embodiments, regardless of whether the first target entry is found in the current table of the spatiotemporal index or the completed table of the spatiotemporal index, the prefetching pattern of the corresponding data processing request is determined based on the reuse confidence and pattern confidence in the first target entry.

[0061] In specific implementation, in response to the reuse confidence being greater than or equal to the first mode threshold and the mode confidence being greater than or equal to the second mode threshold, the prefetch mode corresponding to any data processing request is determined to be the current prefetch mode; or, in response to the reuse confidence being less than the first mode threshold and the mode confidence being less than the second mode threshold, the prefetch mode corresponding to any data processing request is determined to be another prefetch mode.

[0062] In some optional embodiments, if the first target entry is not found in the current table of the spatiotemporal index, but is found in the completion table of the spatiotemporal index, then the program count identifier, reuse confidence, and pattern confidence corresponding to the first target entry are updated to the first entry in the current table of the spatiotemporal index; and the previous address in the first entry is updated to the missing address corresponding to the data processing request.

[0063] Specifically, if the first target entry is found in the spatiotemporal index completion table, it means that the program counter identifier of the previously completed program has reappeared after a certain period of time. The entry is then updated in the current spatiotemporal index table, and the missing address is recorded as the previous address.

[0064] The current table of the spatiotemporal index stores entries corresponding to processes that are in progress or have already been performed; the completion table of the spatiotemporal index stores entries corresponding to processes that have already been completed.

[0065] In some embodiments, if neither the current table of the spatiotemporal index nor the completion table of the spatiotemporal index contains a first target entry matching the program counter of the data processing request, then a first replacement entry is determined in the current table of the spatiotemporal index, a second replacement entry is determined in the completion table of the spatiotemporal index, the program counter identifier, reuse confidence, and pattern confidence in the second replacement entry are updated to the program counter identifier, reuse confidence, and pattern confidence of the first replacement entry, and the unused count in the second replacement entry is cleared; the data in the first replacement entry is deleted, and the program counter identifier, previous address, and spatial base address corresponding to any data processing request are recorded in the first replacement entry.

[0066] The first replacement entry can be the entry in the current spatiotemporal index table whose most recent update time is oldest; the second replacement entry can be the entry in the completed spatiotemporal index table whose most recent update time is oldest. The purpose of the above steps is that since neither the current spatiotemporal index table nor the completed spatiotemporal index table has an entry corresponding to the data processing request, it indicates that the data processing request has not occurred before, or occurred a long time ago but the entry was replaced. Therefore, an entry needs to be determined in the current spatiotemporal index table to record the relevant parameters corresponding to the data processing request. New data can be recorded in the current spatiotemporal index table through replacement, and further, the replacement entry can be determined based on the most recent update time; the replaced entry is then updated in the completed spatiotemporal index table.

[0067] Step S204: Based on at least one of the prefetch mode, program counter identifier, training address, and prefetch degree corresponding to each data processing request, determine at least one prefetch address corresponding to each data processing request from the time pattern table or spatiotemporal pattern table.

[0068] In some embodiments, the carrier determines whether to look up a temporal pattern table or a spatiotemporal pattern table based on a prefetch pattern. After determining the table to look up, the corresponding prefetch address is determined based on at least one of the prefetch pattern, program counter identifier, training address, and prefetch degree for each data processing request. The prefetch degree includes the number of prefetch addresses corresponding to each training address.

[0069] Thus, the data processing method provided by the embodiments of this disclosure can store potentially duplicated spatiotemporal sequence information in the form of spatial base address identifier, spatial bit vector, confidence level, and reuse count, greatly reducing the storage space of spatiotemporal sequence information, helping to further develop prefetching technology, and improving data processing efficiency.

[0070] Figure 3 A schematic diagram of a first alternative structure of the data processing apparatus provided in an embodiment of this disclosure is shown, and will be described in terms of each part.

[0071] like Figure 3 As shown, the data processing device may include a monitoring and adjustment unit, a spatiotemporal flow indexing unit, a spatiotemporal flow training unit, and a size adjustment unit. Optionally, the data processing device may further include a last-level cache, in which a time pattern table and a spatiotemporal pattern table are stored.

[0072] In some embodiments, the monitoring and adjustment unit is used to receive memory access miss and prefetch hit request information from different processor cores, perform thread partition merging, and dynamically adjust the index bits according to the memory access information; at the same time, it also receives and monitors the issued prefetch requests and the validity of LLC, and feeds back to the size adjustment unit at fixed memory access cycles.

[0073] The spatiotemporal flow indexing unit receives missing / prefetch hit requests from the monitoring and adjustment unit after thread merging is completed. It indexes entries based on the PC. After finding an entry, it feeds back to the spatiotemporal flow training unit to train the spatiotemporal flow memory access mode. At the same time, it sends the information to the recently used table to find a suitable prefetch address that can be issued. It also sends the relevant information to the size adjustment unit for evaluation.

[0074] The spatiotemporal flow training unit is used to train and find suitable time / spatiotemporal memory access flow patterns and monitor confidence information. When the confidence meets the condition threshold, it notifies the indexing unit to record it in the time pattern table / spatiotemporal pattern table.

[0075] The time pattern table and time-space storage are in the LLC. The table entries are dynamically adjusted by the size adjustment unit at fixed memory access intervals. The main purpose is to determine how much capacity the prefetched metadata occupies in the LLC can achieve the best overall effect based on historical information.

[0076] The recently used table stores the most recently used entries in the time / space schema table (similar to a cache of the time / space schema table). When indexing, it first checks if the entry exists in this table, and then searches the two schema tables in the LLC. This reduces the number of LLC accesses and improves the speed of issuing prefetch requests.

[0077] Figure 4 A second alternative flowchart of the data processing method provided in this disclosure embodiment is shown, and will be described according to each step.

[0078] Step S401: Receive memory access miss / prefetch hit information.

[0079] In some embodiments, the monitoring and adjustment unit receives memory access miss / prefetch hit information, i.e., a data processing request.

[0080] Step S402: Merge partition threads and dynamically adjust index bits.

[0081] In some embodiments, the monitoring and adjustment unit merges partition threads and dynamically selects storage locations based on the processor core type, and dynamically adjusts index bits based on the training address in the data processing request, i.e., determines the index mode.

[0082] Step S403: Use the program counter to identify the lookup entry for the spacetime flow index cell.

[0083] In some embodiments, the spatiotemporal stream indexing unit searches the current table and / or the completed table of the spatiotemporal stream index corresponding to the spatiotemporal stream indexing unit based on the index pattern and the program counter identifier corresponding to the data processing request to determine a first target entry, the reuse confidence and pattern confidence in the first target entry. If the first target entry is not found, a first replacement entry is determined based on the storage location.

[0084] Step S404: Determine the spatiotemporal pattern table and the time pattern table based on the data processing request.

[0085] In some embodiments, the spatiotemporal flow training unit uses the previous address and its spatial region as an index to look up the spatiotemporal flow training unit entry and check for existing prefetch pattern information. Based on this, it updates the confidence level and the relevant information of the entry. After the update is completed, it checks whether the confidence level meets the threshold condition. If it does, it updates the time / spatiotemporal pattern table.

[0086] Step S405: Determine the prefetch address.

[0087] In some embodiments, the prefetch mode is activated only when the first target entry is found and the reuse confidence and pattern confidence meet the threshold. First, the missing address (or training address) is used as an index to search for the entry in the recently used table. If it is found, a prefetch request is generated and issued based on the recorded entry information. If it is not found, the index is used to search for the time / spatiotemporal pattern table information, and a prefetch request is generated and issued based on the entry information. At the same time, the corresponding entry is copied to the recently used table for storage.

[0088] Thus, the data processing method provided by the embodiments of this disclosure can store potentially duplicated spatiotemporal sequence information in the form of spatial base address identifier, spatial bit vector, confidence level, and reuse count, greatly reducing the storage space of spatiotemporal sequence information, helping to further develop prefetching technology, and improving data processing efficiency.

[0089] Figure 5 A schematic diagram of the thread partition table and backfill request tracking table provided in an embodiment of this disclosure is shown.

[0090] In some embodiments, the thread partition table and the backfill request tracking table are maintained by a monitoring and adjustment unit. The monitoring and adjustment unit determines the accuracy of data prefetch requests, thread partition merging, and dynamic index bit adjustment based on the number of prefetch requests or backfill requests in the data processing requests.

[0091] In some embodiments, when the monitoring and adjustment unit receives a data processing request (cache miss request and / or prefetch hit request) from the upper cache, it increments the access interval in the backfill request tracking table by one to record the number of memory accesses, and searches for existing entries in the thread partition table using the corresponding thread identification information (page table base address, process ID, etc.).

[0092] If no entry corresponding to the thread identifier information is found in the thread partition table, a replacement entry is determined, and the thread identifier, access core, index mode, index confidence, nearest address, and index gap bit from the data processing request are updated to the replacement entry.

[0093] If an entry corresponding to the thread identifier information is found in the thread partition table, the access core location is updated according to the received memory access core location. Then, based on the index pattern, the positions where the index differs between the training address and the nearest address are compared. If the proportion of differing index bits is greater than or equal to the index threshold, the index confidence of the process is increased; or, if the proportion of differing index bits is less than the index threshold, the index confidence of the process is decreased, and the index difference bits not included in the index pattern are recorded. The index threshold can be set according to actual needs or experimental results.

[0094] For example, if the index pattern is bits a through b of the training address or the nearest address, then the positions where the indices differ between bits a through b of the training address and bits a through b of the nearest address are compared. If the number of index bits in the index pattern is c, but the number of indices that differ between the training address and bits a through b of the comparison address is d, and d / c is less than the index threshold, then the index confidence of the process is reduced, and the positions of other index bits that differ between the training address and the nearest address besides the index pattern are recorded.

[0095] In some optional embodiments, the index confidence level can be checked periodically at fixed access intervals. When it is less than the index confidence level, the index mode is switched according to the index difference bits to ensure that the index can evenly distribute the training addresses. The low bits of the index are adjusted in the above manner, while the high bits are adjusted by monitoring the number of threads in progress and the core ratio. That is, all entries are grouped and evenly distributed according to the number of memory access cores and the total number of threads to avoid all resources being occupied by a single thread or core, while the remaining threads or cores are "starved".

[0096] For the backfill request tracking table, the prefetch count is incremented by one each time a prefetch address is issued, the LLC backfill count is incremented by one each time an LLC backfill request is issued, the prefetch hit count is incremented by one each time a prefetch hit is received, and the memory miss count is incremented by one each time a memory miss request is received. Relevant information is calculated and the prefetch rate is adjusted at fixed intervals.

[0097] In practice, the following steps are taken: the number of prefetch addresses issued; the number of prefetch addresses responded to by the last-level cache (i.e., the number of LLC backfill requests); the number of received prefetch hits; the number of received data missing requests; the prefetch accuracy is determined based on the number of prefetch hits and the number of prefetch addresses issued; the prefetch percentage is determined based on the number of prefetch hits and the number of prefetch addresses responded to by the last-level cache; the prefetch coverage is determined based on the number of prefetch hits and the number of data missing requests; and the prefetch rate is adjusted based on the prefetch accuracy, prefetch percentage, and prefetch coverage.

[0098] Wherein, prefetch accuracy = number of prefetch hits / number of prefetches issued, prefetch percentage = number of prefetch hits / number of LLC fills, and prefetch coverage = number of prefetch hits / number of memory misses accessed.

[0099] In this way, by dynamically adjusting the prefetch rate through the monitoring unit, the prefetch accuracy and coverage can be ensured, and the interface adjustments and other module modifications required for porting can be reduced.

[0100] In response to a prefetch accuracy greater than a first adjustment threshold and a prefetch coverage less than a second adjustment threshold, the prefetch degree in the backfill request tracking table is increased; in response to a prefetch accuracy less than a third adjustment threshold, the prefetch degree in the backfill request tracking table is decreased, and the capacity ratio of the time pattern table and the spatiotemporal pattern table in the last-level cache is adjusted based on the prefetch accuracy and the prefetch ratio. The first adjustment threshold is greater than the third adjustment threshold. The prefetch degree characterizes the number of prefetch addresses that a data processing request can determine.

[0101] In this way, the monitoring and adjustment unit merges thread partitions by thread identifier and dynamically adjusts index bits by combining differences in threads, memory access cores, and index modes. This ensures fair resource allocation for each thread and core and guarantees a uniform distribution of training addresses in the prefetch unit entries. It monitors the number of prefetch requests and LLC backfill requests, and dynamically adjusts the capacity ratio of the time / space-time pattern table, the space-time pattern table, and the LLC based on the number of valid and replaced time / space-time patterns. This allows the space-time prefetch unit to be implemented on-chip, reducing the additional cost of implementing the prefetcher.

[0102] Figure 6 A schematic diagram of a third optional flow of the data processing method provided in this disclosure embodiment is shown. Figure 7 This illustration shows a schematic diagram of the current table and the completed table of the spatiotemporal flow index provided in an embodiment of this disclosure, which will be combined with... Figure 6 and Figure 7 Please provide an explanation.

[0103] like Figure 7 As shown, entries in the spatiotemporal flow index unit are indexed using the program counter identifier. The current table of the spatiotemporal flow index records the program counter identifier (PC identifier), previous address, spatial base address, most recent offset, number of unused instances, reuse confidence, pattern confidence, and pattern type information. The completed table of the spatiotemporal flow index records the PC identifier, number of unused instances, reuse confidence, pattern confidence, and pattern type information.

[0104] Overall, when a data processing request is received, the system first searches the current table of the spatiotemporal flow index based on the PC identifier. If the first target entry is not found, it searches the completion table of the spatiotemporal flow index based on the PC identifier. If the first target entry is also not found, a first replacement entry is determined in the current table of the spatiotemporal flow index, and a second replacement entry is determined in the completion table of the spatiotemporal flow index. The data in the second replacement entry is cleared, and the data in the first replacement entry is filled in. The data in the first replacement entry is then deleted. If the first target entry is found in the completion table of the spatiotemporal flow index, it means that the previously completed PC has reappeared after a certain period of time. The first target entry is then updated in the current table of the spatiotemporal flow index, and the missing address is recorded as the previous address.

[0105] Step S601: Search the current table of the spatiotemporal flow index based on the program counter identifier and index pattern.

[0106] In some embodiments, the carrier searches the current table of the spatiotemporal stream index based on the program counter identifier and index pattern corresponding to the data processing request to determine whether a corresponding first target entry exists.

[0107] If the first target entry exists in the current table of the spatiotemporal flow index, then proceed to step S602; if the first target entry does not exist in the current table of the spatiotemporal flow index, then proceed to step S605.

[0108] Step S602: Determine whether the space base address is empty.

[0109] In some embodiments, the carrier determines whether the spatial base address in the first target entry is empty. If the spatial base address of the first target entry in the current table of the spatiotemporal flow index is empty, then step S603 is executed; if the spatial base address of the first target entry in the current table of the spatiotemporal flow index is not empty, then step S604 is executed.

[0110] Specifically, if the spatial base address is not empty, it means that a spatial stream training is in progress; if the spatial base address is empty, it means that this is the second access under this PC, and no entries have been formed yet.

[0111] Step S603: Determine whether they are in the same spatial region.

[0112] In some embodiments, if the spatial base address is not empty, the first spatial base address is determined based on the training address of the data processing request; the first spatial base address may be determined based on the high-order bits of the training address.

[0113] Furthermore, the low-order bits of the training address are extracted to determine the offset value within the spatial region corresponding to the spatial base address; and the value of the second spatial base address in the first target entry is compared to determine whether they are in the same spatial region.

[0114] In some embodiments, if the first spatial base address is the same as the second spatial base address, it is determined that the flow of the spatial region corresponding to the first target entry has not ended; if the first spatial base address is different from the second spatial base address, it is determined that the flow of the spatial region corresponding to the first target entry has ended.

[0115] In specific implementation, if it is determined that the flow in the spatial region corresponding to the first target entry has not ended, the most recent offset value in the first target entry is updated based on the offset value between the previous address in the first target entry and the training address of any data processing request. Specifically, the previous address in the entry and the most recent offset of the training address are sent to the training unit, and the most recent offset is updated to the offset value in the spatial region of the lower bits of the training address.

[0116] In other embodiments, if the data is not in the same spatial region, and the flow in that spatial region has ended, the previous address in the first target entry is updated to the computed address, and the spatial base address and the nearest offset in the first target entry are updated based on the spatial base address and the nearest offset value corresponding to the data processing request; wherein the computed address is determined based on the spatial base address and the nearest offset.

[0117] In practice, the spatial base address in the first target entry is combined with the most recent offset to form the address of the most recent memory access under that PC (computation address), and sent to the spatiotemporal stream training unit along with the training address. Furthermore, the previous address of the first target entry is updated to the computation address, and the spatial base address and most recent offset of the training address are filled into the corresponding positions of the first target entry.

[0118] Step S604: Update the current table of spatiotemporal flow index and send the data to the training unit.

[0119] In some embodiments, if the spatial base address is empty, the spatial base address of the first target entry is updated based on the high-order bits of the training address corresponding to the data processing request, and the nearest offset of the first target entry is updated based on the low-order bits of the training address corresponding to the data processing request.

[0120] In practice, if the space base address is empty, it means that this is the second access corresponding to the program counter. There is no completed entry yet. An entry needs to be determined in the current table of the spatiotemporal flow index as the entry to record the program counter. The high and low bits of the training address are filled into the space base address and the most recent offset. The previous address and the memory access missing address are sent to the spatiotemporal flow training unit.

[0121] Step S605: Locate the spatiotemporal flow index to complete the table.

[0122] In some embodiments, if the carrier does not find the first target entry in the current table of the spatiotemporal flow index, it searches for the existence of the first target entry in the completed table of the spatiotemporal flow index.

[0123] If the first target entry is found in the spatiotemporal flow index completion table, proceed to step S606; if the first target entry is not found in the spatiotemporal flow index completion table, proceed to step S607.

[0124] Step S606: Replace the entries in the completed spatiotemporal flow index table with the current spatiotemporal flow index table.

[0125] In some embodiments, in response to the absence of a first target entry matching the program counter of the data processing request in the spatiotemporal stream index completion table, the carrier determines a first replacement entry in the current spatiotemporal index table, determines a second replacement entry in the spatiotemporal index completion table, updates the program counter identifier, reuse confidence, and pattern confidence in the second replacement entry to the program counter identifier, reuse confidence, and pattern confidence of the first replacement entry, and clears the unused count in the second replacement entry; deletes the data in the first replacement entry, and records the program counter identifier, previous address, and spatial base address corresponding to any data processing request in the first replacement entry.

[0126] S607, select the first replacement entry, and set the value to the default value.

[0127] In some embodiments, if a first target entry exists in the spatiotemporal flow index completion table, a third target entry is determined in the current spatiotemporal flow index table; the data in the third target entry is deleted, and the data in the first target entry is filled into the response position of the first target entry, and the previous address in the third target entry is updated to the training address.

[0128] Step S608: Determine the prefetch pattern based on reuse confidence and pattern confidence.

[0129] In some embodiments, if a first target entry is found in the current table or the completed table of the spatiotemporal flow index based on the PC identifier, the reuse confidence and pattern confidence corresponding to the first target entry are obtained; if the reuse confidence is greater than a first threshold and the pattern confidence is greater than a second threshold, the prefetching pattern is determined based on the current pattern confidence. The prefetching pattern includes a time pattern or a spatiotemporal pattern, where the time pattern corresponds to a time pattern table and the spatiotemporal pattern corresponds to a spatiotemporal pattern table.

[0130] In some optional embodiments, if the reuse confidence is greater than a first threshold and the pattern confidence is greater than a second threshold, the PC identifier and training address are sent to a temporary record table for retrieval; if there is no third target entry corresponding to the PC identifier and training address in the temporary record table, the search is performed in the spatiotemporal pattern table or time pattern table stored in LLC.

[0131] In this way, by simultaneously tracking the time and spatiotemporal flow patterns of the same PC and setting a large number of threshold parameters, we can ensure that the accuracy of the obtained memory access patterns is high and that the number of recurrences is sufficient, thus retaining the most effective data and reducing the number of memory access patterns stored.

[0132] Figure 8 A schematic diagram of a fourth optional flow of the data processing method provided in this disclosure embodiment is shown. Figure 9 This illustration shows a schematic diagram of a historical sample table provided in an embodiment of this disclosure, which will be combined with... Figure 8 and Figure 9 Please provide an explanation.

[0133] like Figure 9 As shown, the historical sample table is used to find and record patterns in temporal or spatiotemporal memory access patterns based on data processing requests. The historical sample table includes the previous address identifier, spatial region base address, spatial bit vector, stream length, recurrence, confidence level, access interval, and first offset. The confidence level is maintained at one bit for each bit vector, defaulting to 0; a value of 1 indicates that the bit was accessed repeatedly. The recurrence bit indicates the number of times the address within the spatial region was accessed in this memory access stream. The access interval is counted starting at the end of the spatial stream, checking the retention period after the spatial stream ends, tracking the state of the memory access pattern within the time period, and updating the corresponding confidence information in the index table. The first offset is also recorded; this is the offset value within the spatial region of the first training address when the entry is created.

[0134] Step S801: Search the historical sample table based on the previous address identifier.

[0135] In some embodiments, the spatiotemporal flow training unit is primarily used to maintain a historical sample table.

[0136] In some embodiments, the information received by the spatiotemporal stream training unit includes the spatial base address, the most recent offset, the training address, and the previous address transmitted by the spatiotemporal stream indexing unit when a first target entry is found based on the PC identifier and the spatial base address is not empty; or the previous address. The most recent offset of the training address.

[0137] In some embodiments, the spatiotemporal flow training unit receives information transmitted by the spatiotemporal flow indexing unit, searches the historical sample table based on the previous address in the first target entry, and determines whether a second target entry has been found.

[0138] In some embodiments, if no second target entry is found, step S802 is executed; if a second target entry is found, it is further determined whether a calculation address exists in the information transmitted by the spatiotemporal stream index unit; if no calculation address exists, step S803 is executed; if a calculation address exists, step S806 is executed. The calculation address is determined based on the previous address and the most recent offset.

[0139] Step S802: Determine and update the third target entry.

[0140] In some embodiments, if the second target entry does not exist in the historical sample table, a third target entry is determined in the historical sample table, and the third target entry is updated based on the first target entry. The third target entry can be the entry in the historical sample table whose most recent update time is furthest from the current time; that is, an entry that has not been updated for a long time.

[0141] Specifically, the carrier can fill in the previous address identifier, the spatial region base address and update the corresponding spatial bit vector in the third target entry, and set the stream length to 1.

[0142] In some alternative embodiments, the carrier performs step S805 after determining and updating the third target entry.

[0143] Step S803: Determine whether the spatial bit vector in the second target entry corresponds to the first offset value.

[0144] In some embodiments, if a second target entry exists in the historical sample table and the information transmitted by the spatiotemporal flow index unit does not contain a calculation address, then a first offset value is determined based on the first target entry, and it is determined whether the spatial bit vector in the second target entry corresponds to the first offset value.

[0145] If the spatial bit vector in the second target entry corresponds to the first offset value, then step S804 is executed; if the spatial bit vector in the second target entry does not correspond to the first offset value, then step S805 is executed.

[0146] Figure 10A schematic diagram of the previous address, neighboring address, spatial bit vector, and stream length provided in an embodiment of this disclosure is shown.

[0147] In some embodiments, the spatial region described in this disclosure divides the memory into equal intervals according to a set space size, and the memory is determined based on the high-order bits of the address. For example, if there are 100 entries in the memory and the space region size is 10, it is divided into 10 space regions, with 1-10 being the first, 11-20 being the second, and so on. The space region can be determined based on the value of the first and second bits.

[0148] In some embodiments, such as Figure 10 As shown, memory access flows for the same PC are partitioned based on time and spatial flow characteristics, and address pairs are paired with spatial flows. Taking a memory access flow of A, B+1, B+4, B+7, C, D, D+2, D+6 as an example, different letters represent different spatial regions. After accessing memory address A, if the next address is found to be B, the previous address is filled with A, the adjacent address is filled with B, and the 0th bit of the spatial bit vector is set to 1 to indicate that the bit vector has been accessed. For subsequent accesses to addresses in the same spatial region, the corresponding bits in the spatial bit vector are set to 1, such as... Figure 10 In this process, the corresponding vectors of bits 0, 1, 4, and 7 are set to 1, resulting in the corresponding spatial bit vector 1100100100. The spatial stream length is recorded as 4. This process is repeated to fill the following entries and track the confidence level of each entry. When the confidence level meets the threshold condition, it is passed to the corresponding spatiotemporal pattern and time pattern table based on the stream length (if the stream length is less than the threshold, it is stored in the time pattern table). When a memory access miss occurs that meets the condition, the address is extracted, a prefetch address is generated, and issued.

[0149] Accordingly, for memory address C, its previous address is B+7, and its nearest address is X. Only C+0 is accessed for memory address C, therefore the space vector is 1000000000, and the stream length is 1; and this entry is in time mode. For memory address D, its previous address is C, and its nearest address is D+2. Accessing D+2, D, and D+6, therefore the space vector is 1010001000, and the stream length is 3.

[0150] In some embodiments, it is determined whether there is a bit in the bit vector of the second target entry that matches the spatial address offset value of the training address, that is, whether the bit vector in the second target entry matches the spatial offset value of the training address (e.g., the spatial offset value of D+2 relative to D is 2, and correspondingly, the third bit in the spatial bit vector is 1).

[0151] Thus, by using a reused structure and spatial location metadata approach, both time and spatiotemporal flow prefetching units are realized simultaneously, which can reduce the overhead of existing high-performance time prefetchers such as Triage and track more complex spatiotemporal patterns compared to spatiotemporal prefetchers such as Triage-ISR.

[0152] Step S804: Increase the reproducibility and confidence level in the second target entry.

[0153] In some embodiments, if the spatial bit vector in the second target entry corresponds to the first offset value, the recurrence bit and confidence level in the second target entry are increased; specifically, the corresponding bit is set to 1 according to the spatial address offset value and the stream length is increased, and then the value of the recurrence bit is increased.

[0154] In some embodiments, in response to the inclusion of a computed address in the first target entry, the access interval in the second target entry is increased; the confidence level and reproducibility are adjusted.

[0155] Step S805: Increase the stream length and recurrence bit in the second target entry.

[0156] In some embodiments, if a second target entry exists, but the information transmitted by the spatiotemporal stream index unit does not contain a computed address, and the spatial bit vector in the second target entry does not correspond to the first offset value, then the stream length and recurrence bit in the second target entry are increased, and the spatial bit vector is updated based on the first offset value.

[0157] In specific implementation, if the spatial bit vector in the second target entry does not correspond to the first offset value, or if there is a bit that matches the spatial address offset value of the training address, it indicates that the spatial base address offset value has occurred repeatedly. In this case, the confidence level corresponding to the corresponding bit is set to 1, and the recurrence bit is incremented by 1. Furthermore, the values ​​of the confidence level and the recurrence bit are checked at this time. When the recurrence value reaches different threshold states, the reuse confidence level in the index unit is increased. When the number of 1s in the position confidence level reaches different thresholds, the pattern confidence level in the index unit is increased.

[0158] Step S806: Increase the access interval in the second target entry; adjust the confidence level and reproducibility.

[0159] In some alternative embodiments, in response to the inclusion of a computed address in the transmitted information, the access interval in the second target entry is increased; the confidence level and reproducibility are adjusted.

[0160] In specific implementation, if the information transmitted by the spatiotemporal flow index unit includes a computation address, it indicates that the spatial flow access of the entry has ended. At this time, the access interval counter is incremented. Each time a training address arrives at the historical sample table, the memory access interval is incremented by 1. Under the condition that the training address is under the same spatial base address of the same PC is met, the confidence and recurrence bits of the corresponding bits are adjusted, and the reuse confidence and pattern confidence of the corresponding entry in the index unit are updated.

[0161] Step S807: Determine whether the condition threshold is met.

[0162] In some embodiments, the reuse confidence in the current table of the spatiotemporal index is updated based on the recurrence bit value and confidence value in the second target entry; the pattern confidence in the current table of the spatiotemporal index is updated based on the number of the first element in the spatial bit vector. The first element can be 1.

[0163] In practice, when a time threshold is reached, it is determined whether the recurrence position value and confidence value meet the condition thresholds, such as whether the recurrence position value is greater than a first preset threshold and whether the confidence value is greater than a second preset threshold. If both are met, the condition thresholds are determined to be met. The spatiotemporal flow index unit determines the pattern confidence of the first target entry in the current table of the spatiotemporal flow index. If the pattern confidence is greater than or equal to the second pattern threshold, the first target entry is transmitted to the spatiotemporal pattern table or the time pattern table.

[0164] Only when the number of positions with a confidence level of 1 meets the threshold is the entry stored in the spatiotemporal pattern table. Otherwise, the initial training address is calculated by retrieving the spatial base address and the initial offset, and the threshold is checked when the corresponding position confidence level is 1. Then, the entry is sent to the spatiotemporal pattern table for storage. Simultaneously, if a calculated address exists, it is used as an index to look up and replace entries, updating the calculated address identifier in the previous address identifier, and updating the spatial region base address and bit vector information in the entry.

[0165] Thus, the data processing method provided in this embodiment of the present disclosure obtains an effective time / spacetime access mode by comprehensively judging the bit vector confidence and stream length, ensuring that the spacetime stream mode retains the long space stream access mode, the time stream mode retains the short address pairs with high confidence, and the partition design ensures full utilization of resources.

[0166] In some embodiments, during the process of obtaining a prefetch address, a temporary use table is first retrieved. In response to the retrieval of a fourth target entry in the temporary record table, at least one prefetch address is determined based on the prefetch degree. Alternatively, in response to the retrieval of the fourth target entry in the temporary record table, a spatiotemporal pattern table or a time pattern table in the last-level cache is determined based on the prefetch mode. At least one prefetch address is determined from the spatiotemporal pattern table or the time pattern table based on the prefetch degree, the program counter identifier, and the training address.

[0167] The temporary record table includes a time-pattern temporary table and a spatiotemporal pattern temporary table. The carrier searches the time-pattern temporary table or the spatiotemporal pattern temporary table based on the prefetch pattern corresponding to the data processing request. If the data processing request corresponds to a time pattern, the time-pattern temporary table is searched; if the fourth target entry is not found, the search is performed in the time-pattern table in the LLC. If the data processing request corresponds to a spatiotemporal pattern, the spatiotemporal pattern temporary table is searched; if the fourth target entry is not found, the search is performed in the spatiotemporal pattern table in the LLC.

[0168] In this way, a temporary record table is set up and partitioned according to time / space-time patterns. When a training address that meets the conditions is found, it is searched first, reducing the number of LLC accesses.

[0169] Figure 11 A schematic diagram of the time pattern table and spatiotemporal pattern table provided in the embodiments of this disclosure is shown.

[0170] In some embodiments, the time-pattern temporary table has the same structure as the time-pattern table, only the storage location is different; the spatiotemporal temporary table has the same structure as the spatiotemporal pattern table, only the storage location is different.

[0171] like Figure 11 As shown, the time pattern table includes the PC and trigger address identifier, prediction address, reuse count, and confidence level; the spatiotemporal pattern table includes the PC and trigger address identifier, spatial region base address (i.e., spatial base address), spatial bit vector, and confidence level. The trigger address identifier is the identifier of the previous address, and the trigger address is the previous address.

[0172] For inserting new entries, after receiving the information stream from the spatiotemporal flow indexing unit, the PC index is used to find the position of the replacement entry. The PC and the previous address are merged to obtain the identification information. For the temporal pattern table, the first training address is recorded as the prediction address. For the spatiotemporal pattern table, the obtained spatial region base address and spatial bit vector are filled into the corresponding entries. The initial confidence is calculated based on the pattern confidence value according to the weight ratio. The reuse count is set to an initial value based on the reuse confidence value. When a prefetch address lookup operation is received from the spatiotemporal flow indexing unit, the corresponding entry is searched based on the PC + training address identifier. The corresponding prediction address and spatial region + bit vector are retrieved to calculate the prefetch address, increasing the reuse count. When the reuse count reaches a threshold, it is copied into the recently used table. The confidence operation is implemented by checking the previous address. The corresponding entry is searched using the PC + previous address identifier. The system checks whether the current training address is the same as the prediction address, or whether it corresponds to a certain spatial bit vector within the spatial region base address. If they match, the confidence is incremented by 1.

[0173] Figure 12A schematic diagram of the principle of the size adjustment unit provided in an embodiment of this disclosure is shown.

[0174] In some embodiments, the size adjustment unit is specifically used to adjust the proportion of capacity occupied by the time pattern table, the time-space pattern table, and the LLC within the region to achieve the best performance.

[0175] In some embodiments, such as Figure 12 Specifically, in response to a new entry in the spatiotemporal pattern table in the last-level cache, the value of the first update counter is increased; in response to a new entry in the time pattern table in the last-level cache, the value of the second update counter is increased; in response to a replacement entry in the spatiotemporal pattern table in the last-level cache, the value of the first replacement counter is increased; in response to a replacement entry in the time pattern table in the last-level cache, the value of the second replacement counter is increased; and the capacity ratio of the spatiotemporal pattern table and the time pattern table in the last-level cache is adjusted based on the values ​​of the first update counter, the second update counter, the first replacement counter, and the second replacement counter.

[0176] In practice, whenever the spatiotemporal indexing unit and the spatiotemporal training unit determine that they have obtained a valid time / spatiotemporal access pattern and attempt to update the entries in the time / spatiotemporal pattern table, they increment the time / spatiotemporal pattern update counter in the size adjustment unit. When a pattern table replacement occurs in the time / spatiotemporal pattern table, the corresponding replacement counter in the size adjustment unit is incremented. The time / spatiotemporal pattern update / replacement counters are used to adjust the proportion of the time pattern table and the spatiotemporal pattern table in the overall structure. When the value of time pattern update divided by spatiotemporal pattern update and time pattern replacement divided by spatiotemporal pattern replacement is greater than the increase threshold, the proportion of the time pattern table capacity is increased; conversely, when it is lower than the decrease threshold, the proportion of the spatiotemporal pattern table capacity is increased.

[0177] The monitoring and adjustment unit sends the ratio of prefetch hits to memory access misses, as well as the accuracy of prefetch requests, to the size adjustment unit at fixed time intervals. This information is used to adjust the capacity ratio of the schema table and LLC. When both the prefetch accuracy and the prefetch hit ratio are above a threshold, the overall proportion of the schema table is increased; conversely, if the prefetch accuracy is below a threshold, the overall proportion of the schema table is decreased.

[0178] Figure 13 A schematic diagram of a second alternative structure of the data processing apparatus provided in an embodiment of this disclosure is shown, and will be described in terms of each part.

[0179] In some embodiments, the data processing apparatus includes a storage unit, a monitoring and adjustment unit, a spatiotemporal stream indexing unit, and a determination unit.

[0180] The storage unit is used to determine the time pattern table and the spatiotemporal pattern table based on multiple data processing requests sent by multiple processor cores. The monitoring and adjustment unit is used to merge the threads corresponding to at least one data processing request sent by multiple processor cores based on the type of processor core, to obtain at least one process; The spatiotemporal stream index unit is used to determine at least one lookup entry in the spatiotemporal index table based on the process identifier, and to determine the prefetch mode corresponding to each data processing request based on the program counter identifier and the at least one lookup entry. The determining unit is used to determine at least one prefetch address corresponding to each data processing request from a time pattern table or a spatiotemporal pattern table based on at least one of the prefetch mode, program counter identifier, training address and prefetch degree corresponding to each data processing request. The data processing request includes a data missing request or a data prefetch request, and the prefetch degree includes the number of prefetch addresses corresponding to each training address; the time pattern table includes a program counter identifier, a training address identifier, a prediction address identifier, a reuse count, and a confidence level; the spatiotemporal pattern table includes a program counter identifier, a training address identifier, a spatial base address identifier, a spatial bit vector, a confidence level, and a reuse count; the spatial bit vector is used to represent the identifier of the repeatedly accessed spatial sub-address in the space corresponding to the spatial base address.

[0181] The monitoring and adjustment unit is specifically used to merge threads with the same processor core type and the same process identifier into one process to obtain at least one process. One data processing request corresponds to one thread, and one thread includes at least one data processing request.

[0182] The monitoring and adjustment unit is specifically used to determine the index pattern corresponding to the process based on the training address corresponding to at least one data processing request in the same process, specifically including: Determine the percentage of training addresses with different index bits corresponding to at least two data processing requests in the same process; If the percentage of different index bits is greater than or equal to the index threshold, then the index pattern is determined to be at least one bit with different index bits. The index pattern includes the index bits used when indexing.

[0183] The monitoring and adjustment unit is specifically used to increase the index confidence of the process if the proportion of different index bits is greater than or equal to the index threshold. Alternatively, if the proportion of different index bits is less than the index threshold, the index confidence of the process is reduced.

[0184] The monitoring and adjustment unit is specifically used to periodically determine the index confidence level corresponding to each process; If the index confidence of any process is less than the mode switching threshold, the index mode corresponding to that process is updated. Alternatively, if the index confidence of any of the processes is greater than or equal to the mode switching threshold, the index mode corresponding to the process is not changed.

[0185] The spatiotemporal stream index unit is specifically used to determine at least one lookup entry in the current table of the spatiotemporal index based on the process identifier; Based on the program counter identifier and index pattern corresponding to any data processing request, a search is performed in the at least one search entry to determine whether there is a first target entry that matches the program counter of the data processing request. If a first target entry exists that matches the program counter of the data processing request, then the reuse confidence and pattern confidence corresponding to any data processing request are determined based on the first target entry, and the prefetch pattern corresponding to any data processing request is determined based on the reuse confidence and pattern confidence. The current table of the spatiotemporal index includes a program count identifier, the previous address, the spatial base address, the most recent offset, the number of times it has not been used, the reuse confidence, and the pattern confidence.

[0186] The spatiotemporal stream index unit is specifically used to determine at least one lookup entry in the spatiotemporal index completion table based on the process identifier if there is no first target entry that matches the program counter of the data processing request. Based on the program counter identifier and index pattern corresponding to any data processing request, a search is performed in the at least one search entry to determine whether a matching first target entry exists. If a first target entry exists that matches the program counter of the data processing request, then the reuse confidence and pattern confidence corresponding to any data processing request are determined based on the first target entry, and the prefetch pattern corresponding to any data processing request is determined based on the reuse confidence and pattern confidence. The spatiotemporal index completion table includes the PC index, number of unused items, reuse confidence, and pattern confidence.

[0187] The spatiotemporal stream indexing unit is specifically used to determine the prefetch mode corresponding to any data processing request as the current prefetch mode in response to the reuse confidence being greater than or equal to the first mode threshold and the mode confidence being greater than or equal to the second mode threshold. Alternatively, in response to the reuse confidence being less than the first mode threshold and the mode confidence being less than the second mode threshold, the prefetch mode corresponding to any data processing request is determined to be another prefetch mode.

[0188] The spatiotemporal stream indexing unit is also used for: If a first target entry matching the program counter of the data processing request exists in the spatiotemporal index completion table, the program counter identifier, reuse confidence, and pattern confidence corresponding to the first target entry are updated to the first entry in the current spatiotemporal index table. Update the previous address in the first entry to the missing address corresponding to the data processing request.

[0189] The spatiotemporal stream indexing unit is also used for: In response to the absence of a first target entry in the spatiotemporal index completion table that matches the program counter of the data processing request, a first replacement entry is determined in the current spatiotemporal index table, a second replacement entry is determined in the spatiotemporal index completion table, the program counter identifier, reuse confidence, and pattern confidence in the second replacement entry are updated to the program counter identifier, reuse confidence, and pattern confidence of the first replacement entry, and the unused count in the second replacement entry is cleared. Delete the data in the first replacement entry, and record the program counter identifier, previous address, and space base address corresponding to any of the data processing requests in the first replacement entry.

[0190] The spatiotemporal stream indexing unit is specifically used to further include the method if a first target entry matching the program counter of the data processing request exists: Determine whether the spatial base address corresponding to the first target entry is empty; In response to the fact that the spatial base address corresponding to the first target entry is not empty, the first spatial base address is determined based on the training address of the data processing request; Determine whether the first spatial base address is the same as the second spatial base address in the first target entry; If the first spatial base address is the same as the second spatial base address, then it is determined that the flow of the spatial region corresponding to the first target entry has not ended; Alternatively, if the first spatial base address is different from the second spatial base address, then it is determined that the flow of the spatial region corresponding to the first target entry has ended.

[0191] If it is determined that the flow of the spatial region corresponding to the first target entry has not ended, the spatiotemporal flow index unit is specifically used to update the most recent offset value in the first target entry based on the offset value between the previous address in the first target entry and the training address of any data processing request.

[0192] If it is determined that the flow of the spatial region corresponding to the first target entry has ended, the spatiotemporal flow index unit is specifically used to update the previous address in the first target entry to the calculation address, and update the spatial base address and the latest offset in the first target entry based on the spatial base address and the latest offset value corresponding to the data processing request. The calculated address is determined based on the spatial base address and the nearest offset.

[0193] If a first target entry exists that matches the program counter of the data processing request, then the spatiotemporal stream index unit is specifically used to determine whether the spatial base address corresponding to the first target entry is empty; In response to the fact that the spatial base address corresponding to the first target entry is empty, the spatial base address of the first target entry is updated based on the high-order bits of the training address corresponding to the data processing request, and the nearest offset of the first target entry is updated based on the low-order bits of the training address corresponding to the data processing request.

[0194] In some embodiments, the data processing apparatus may further include a spatiotemporal flow training unit.

[0195] The spatiotemporal flow training unit is used to retrieve data from the historical sample table based on the previous address in the first target entry. In response to the retrieval of the second target entry corresponding to the previous address in the historical sample table, the second target entry is updated based on the first target entry; Alternatively, in response to the fact that no second target entry corresponding to the previous address is found in the historical sample table, a third target entry is determined in the historical sample table and the third target entry is updated based on the first target entry; The historical sample table includes the previous address, spatial base address, spatial bit vector, stream length, recurrence bit, confidence level, access interval, and first offset.

[0196] The spatiotemporal flow training unit is used to determine whether to receive a computation address; the computation address is determined based on the previous address and the most recent offset; In response to the failure to receive a computed address, a first offset value is determined based on the first target entry, and it is determined whether the spatial bit vector in the second target entry corresponds to the first offset value; If the spatial bit vector in the second target entry corresponds to the first offset value, then increase the recurrence bit and confidence level in the second target entry; Alternatively, if the spatial bit vector in the second target entry does not correspond to the first offset value, then the flow length and recurrence bit in the second target entry are increased, and the spatial bit vector is updated based on the first offset value.

[0197] The spatiotemporal flow training unit is configured to increase the access interval in the second target entry in response to receiving a computation address; Adjust the confidence level and recurrence position.

[0198] The spatiotemporal flow training unit is used to update the reuse confidence in the current table of the spatiotemporal index based on the recurrence position value and confidence value in the second target entry. The spatiotemporal index is updated based on the number of the first element in the spatial bit vector to update the pattern confidence in the current table.

[0199] The spatiotemporal stream training unit is used to retrieve data from a temporary record table based on the program counter and training address corresponding to the data request. In response to the retrieval of a fourth target entry in the temporary record table, at least one prefetch address is determined based on the prefetch degree; Alternatively, in response to the failure to retrieve the fourth target entry in the temporary record table, the spatiotemporal pattern table or time pattern table in the last-level cache is determined based on the prefetch pattern. Based on the prefetch degree, program counter identifier, and training address, at least one prefetch address is determined from the spatiotemporal pattern table or the time pattern table.

[0200] The storage unit is used for one of the following: Based on multiple data processing requests corresponding to the same process, update the time pattern table and the spatiotemporal pattern table according to the prefetching mode; Update the spatiotemporal pattern table and time pattern table based on entries in the current table or the spatiotemporal index.

[0201] The storage unit is also used to increment the value of the first update counter in response to a new entry in the spatiotemporal pattern table in the last-level cache; In response to a new entry in the time pattern table in the last-level cache, the value of the second update counter is incremented; In response to a replacement entry in the spatiotemporal pattern table in the last-level cache, the value of the first replacement counter is incremented; In response to a replacement entry in the time pattern table in the last-level cache, the value of the second replacement counter is incremented; The spacetime pattern table and time pattern table are adjusted in terms of capacity ratio in the last-level cache based on the values ​​of the first update counter, the second update counter, the first replacement counter, and the second replacement counter.

[0202] In some embodiments, after determining at least one prefetch address corresponding to each data processing request based on the prefetch mode, training address, and prefetch degree corresponding to each data processing request, the monitoring and adjustment unit is further configured to record the number of prefetch addresses issued. Record the number of response prefetch addresses in the last-level cache; Record the number of received prefetch hits; Record the number of data missing requests received; The prefetch accuracy is determined based on the number of prefetch hits and the number of prefetch addresses issued. The prefetch percentage is determined based on the number of prefetch hits and the number of prefetch addresses for the last-level cache response; Determine the prefetch coverage based on the number of prefetch hits and the number of requests with missing data; The prefetching degree is adjusted based on prefetching accuracy, prefetching percentage, and prefetching coverage.

[0203] The monitoring and adjustment unit is specifically used to include one of the following: If the prefetch accuracy is greater than the first adjustment threshold and the prefetch coverage is less than the second adjustment threshold, then the prefetch degree is increased. If the prefetch accuracy is less than the third adjustment threshold, the prefetch degree is reduced, and the capacity ratio of the time pattern table and the space-time pattern table in the last-level cache is adjusted based on the prefetch accuracy and the prefetch ratio.

[0204] According to embodiments of this disclosure, this disclosure also provides an electronic device and a readable storage medium.

[0205] Figure 14 A schematic block diagram of an example electronic device 800 that can be used to implement embodiments of the present disclosure is shown. The electronic device is intended to represent various forms of digital computers, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processors, cellular phones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions are merely illustrative and are not intended to limit the implementation of the present disclosure described and / or claimed herein.

[0206] like Figure 14 As shown, the electronic device 800 includes a computing unit 801, which can perform various appropriate actions and processes based on a computer program stored in a read-only memory (ROM) 802 or a computer program loaded from a storage unit 808 into a random access memory (RAM) 803. The RAM 803 may also store various programs and data required for the operation of the electronic device 800. The computing unit 801, ROM 802, and RAM 803 are interconnected via a bus 804. An input / output (I / O) interface 805 is also connected to the bus 804.

[0207] Multiple components in electronic device 800 are connected to I / O interface 805, including: input unit 806, such as keyboard, mouse, etc.; output unit 807, such as various types of displays, speakers, etc.; storage unit 808, such as disk, optical disk, etc.; and communication unit 809, such as network card, modem, wireless transceiver, etc. Communication unit 809 allows electronic device 800 to exchange information / data with other devices through computer networks such as the Internet and / or various telecommunications networks.

[0208] The computing unit 801 can be a variety of general-purpose and / or special-purpose processing components with processing and computing capabilities. Some examples of the computing unit 801 include, but are not limited to, a central processing unit (CPU), a graphics processing unit (GPU), various special-purpose artificial intelligence (AI) computing chips, various computing units running machine learning model algorithms, a digital signal processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 801 performs the various methods and processes described above, such as data processing methods. For example, in some embodiments, the data processing method may be implemented as a computer software program tangibly contained in a machine-readable medium, such as storage unit 808. In some embodiments, part or all of the computer program may be loaded and / or installed on the electronic device 800 via ROM 802 and / or communication unit 809. When the computer program is loaded into RAM 803 and executed by the computing unit 801, one or more steps of the data processing method described above may be performed. Alternatively, in other embodiments, the computing unit 801 may be configured to perform data processing methods by any other suitable means (e.g., by means of firmware).

[0209] Various embodiments of the systems and techniques described above herein can be implemented in digital electronic circuit systems, integrated circuit systems, field-programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs), application-specific standard products (ASSPs), systems-on-a-chip (SoCs), payload-programmable logic devices (CPLDs), computer hardware, firmware, software, and / or combinations thereof. These various embodiments may include implementations in one or more computer programs that can be executed and / or interpreted on a programmable system including at least one programmable processor, which may be a dedicated or general-purpose programmable processor, capable of receiving data and instructions from a storage system, at least one input device, and at least one output device, and transmitting data and instructions to the storage system, the at least one input device, and the at least one output device.

[0210] The program code used to implement the methods of this disclosure may be written in any combination of one or more programming languages. This program code may be provided to a processor or controller of a general-purpose computer, special-purpose computer, or other programmable data processing apparatus, such that when executed by the processor or controller, the program code causes the functions / operations specified in the flowcharts and / or block diagrams to be implemented. The program code may be executed entirely on a machine, partially on a machine, as a standalone software package partially on a machine and partially on a remote machine, or entirely on a remote machine or server.

[0211] In the context of this disclosure, a machine-readable medium can be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device. A machine-readable medium can be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium can be, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media include electrical connections based on one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.

[0212] To provide interaction with a user, the systems and techniques described herein can be implemented on a computer having: a display device for displaying information to the user (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor); and a keyboard and pointing device (e.g., a mouse or trackball) through which the user provides input to the computer. Other types of devices can also be used to provide interaction with the user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form (including sound input, voice input, or tactile input).

[0213] The systems and technologies described herein can be implemented in computing systems that include backend components (e.g., as a data server), or computing systems that include middleware components (e.g., an application server), or computing systems that include frontend components (e.g., a user computer with a graphical user interface or web browser through which a user can interact with implementations of the systems and technologies described herein), or any combination of such backend, middleware, or frontend components. The components of the system can be interconnected via digital data communication of any form or medium (e.g., a communication network). Examples of communication networks include local area networks (LANs), wide area networks (WANs), and the Internet.

[0214] Computer systems can include clients and servers. Clients and servers are generally located far apart and typically interact via communication networks. Client-server relationships are created by computer programs running on the respective computers and having a client-server relationship with each other. Servers can be cloud servers, servers in distributed systems, or servers incorporating blockchain technology.

[0215] It should be understood that the various forms of processes shown above can be used to rearrange, add, or delete steps. For example, the steps described in this disclosure can be executed in parallel, sequentially, or in different orders, as long as the desired result of the technical solution disclosed in this disclosure can be achieved, and this is not limited herein.

[0216] Furthermore, the terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one of that feature. In the description of this disclosure, "a plurality of" means two or more, unless otherwise explicitly specified.

[0217] The above description is merely a specific embodiment of this disclosure, but the scope of protection of this disclosure is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this disclosure should be included within the scope of protection of this disclosure. Therefore, the scope of protection of this disclosure should be determined by the scope of the claims.

Claims

1. A data processing method, characterized in that, The method includes: Determine the time pattern table and the spatiotemporal pattern table based on multiple data processing requests sent from multiple processor cores; Based on the type of processor core, the threads corresponding to at least one data processing request sent by multiple processor cores are merged to obtain at least one process; Based on the process identifier, at least one lookup entry is determined in the spatiotemporal index table, and based on the program counter identifier and the at least one lookup entry, the prefetch mode corresponding to each data processing request is determined; Based on at least one of the prefetch mode, program counter identifier, training address, and prefetch degree corresponding to each data processing request, determine at least one prefetch address corresponding to each data processing request from the time pattern table or the spatiotemporal pattern table. The data processing request includes a data missing request or a data prefetch request, and the prefetch degree includes the number of prefetch addresses corresponding to each training address; the time pattern table includes a program counter identifier, a training address identifier, a prediction address identifier, a reuse count, and a confidence level; the spatiotemporal pattern table includes a program counter identifier, a training address identifier, a spatial base address identifier, a spatial bit vector, a confidence level, and a reuse count; the spatial bit vector is used to represent the identifier of the repeatedly accessed spatial sub-address in the space corresponding to the spatial base address.

2. The method according to claim 1, characterized in that, The method of merging the threads corresponding to the at least one data processing request based on the processor core type to obtain at least one process includes: Threads with the same processor core type and process ID are merged into one process to obtain at least one process. One data processing request corresponds to one thread, and one thread includes at least one data processing request.

3. The method according to claim 1, characterized in that, The method further includes: The index pattern corresponding to the process is determined based on the training address corresponding to at least one data processing request in the same process, specifically including: Determine the percentage of training addresses with different index bits corresponding to at least two data processing requests in the same process; If the percentage of different index bits is greater than or equal to the index threshold, then the index pattern is determined to be at least one bit with different index bits. The index pattern includes the index bits used when indexing.

4. The method according to claim 3, characterized in that, The method further includes: If the proportion of different index bits is greater than or equal to the index threshold, then the index confidence of the process is increased; Alternatively, if the proportion of different index bits is less than the index threshold, the index confidence of the process is reduced.

5. The method according to claim 3, characterized in that, The method further includes: Periodically determine the index confidence level for each process; If the index confidence of any process is less than the mode switching threshold, the index mode corresponding to that process is updated. Alternatively, if the index confidence of any of the processes is greater than or equal to the mode switching threshold, the index mode corresponding to the process is not changed.

6. The method according to claim 3, characterized in that, The process of determining at least one lookup entry in the spatiotemporal index table based on the process identifier, and determining the prefetch pattern corresponding to each data processing request based on the program counter identifier and the at least one lookup entry, includes: At least one lookup entry is determined in the current table of the spatiotemporal index based on the process identifier; Based on the program counter identifier and index pattern corresponding to any data processing request, a search is performed in the at least one search entry to determine whether there is a first target entry that matches the program counter of the data processing request. If a first target entry exists that matches the program counter of the data processing request, then the reuse confidence and pattern confidence corresponding to any data processing request are determined based on the first target entry, and the prefetch pattern corresponding to any data processing request is determined based on the reuse confidence and pattern confidence. The current table of the spatiotemporal index includes a program count identifier, the previous address, the spatial base address, the most recent offset, the number of times it has not been used, the reuse confidence, and the pattern confidence.

7. The method according to claim 6, characterized in that, The method further includes: If no first target entry matches the program counter of the data processing request, at least one lookup entry is determined in the spatiotemporal index completion table based on the process identifier. Based on the program counter identifier and index pattern corresponding to any data processing request, a search is performed in the at least one search entry to determine whether a matching first target entry exists. If a first target entry exists that matches the program counter of the data processing request, then the reuse confidence and pattern confidence corresponding to any data processing request are determined based on the first target entry, and the prefetch pattern corresponding to any data processing request is determined based on the reuse confidence and pattern confidence. The spatiotemporal index completion table includes the PC index, number of unused items, reuse confidence, and pattern confidence.

8. The method according to claim 6 or 7, characterized in that, The step of determining the prefetch pattern corresponding to any data processing request based on reuse confidence and pattern confidence includes: In response to the reuse confidence being greater than or equal to the first mode threshold and the mode confidence being greater than or equal to the second mode threshold, the prefetch mode corresponding to any data processing request is determined to be the current prefetch mode. Alternatively, in response to the reuse confidence being less than the first mode threshold and the mode confidence being less than the second mode threshold, the prefetch mode corresponding to any data processing request is determined to be another prefetch mode.

9. The method according to claim 7, characterized in that, The method further includes: If a first target entry matching the program counter of the data processing request exists in the spatiotemporal index completion table, the program counter identifier, reuse confidence, and pattern confidence corresponding to the first target entry are updated to the first entry in the current spatiotemporal index table. Update the previous address in the first entry to the missing address corresponding to the data processing request.

10. The method according to claim 7, characterized in that, The method further includes: In response to the absence of a first target entry in the spatiotemporal index completion table that matches the program counter of the data processing request, a first replacement entry is determined in the current spatiotemporal index table, a second replacement entry is determined in the spatiotemporal index completion table, the program counter identifier, reuse confidence, and pattern confidence in the second replacement entry are updated to the program counter identifier, reuse confidence, and pattern confidence of the first replacement entry, and the unused count in the second replacement entry is cleared. Delete the data in the first replacement entry, and record the program counter identifier, previous address, and space base address corresponding to any of the data processing requests in the first replacement entry.

11. The method according to claim 6, characterized in that, If a first target entry exists that matches the program counter of the data processing request, the method further includes: Determine whether the spatial base address corresponding to the first target entry is empty; In response to the fact that the spatial base address corresponding to the first target entry is not empty, the first spatial base address is determined based on the training address of the data processing request; Determine whether the first spatial base address is the same as the second spatial base address in the first target entry; If the first spatial base address is the same as the second spatial base address, then it is determined that the flow of the spatial region corresponding to the first target entry has not ended; Alternatively, if the first spatial base address is different from the second spatial base address, then it is determined that the flow of the spatial region corresponding to the first target entry has ended.

12. The method according to claim 11, characterized in that, If it is determined that the flow of the spatial region corresponding to the first target entry has not ended, the method further includes: The most recent offset value in the first target entry is updated based on the offset value between the previous address in the first target entry and the training address of any of the data processing requests.

13. The method according to claim 11, characterized in that, If it is determined that the flow of the spatial region corresponding to the first target entry has ended, the method further includes: Update the previous address in the first target entry to the calculated address, and update the spatial base address and the nearest offset in the first target entry based on the spatial base address and the nearest offset value corresponding to the data processing request; The calculated address is determined based on the spatial base address and the nearest offset.

14. The method according to claim 6, characterized in that, If a first target entry exists that matches the program counter of the data processing request, the method further includes: Determine whether the spatial base address corresponding to the first target entry is empty; In response to the fact that the spatial base address corresponding to the first target entry is empty, the spatial base address of the first target entry is updated based on the high-order bits of the training address corresponding to the data processing request, and the nearest offset of the first target entry is updated based on the low-order bits of the training address corresponding to the data processing request.

15. The method according to claim 6, characterized in that, The method further includes: Search the historical sample table based on the previous address in the first target entry; In response to the retrieval of the second target entry corresponding to the previous address in the historical sample table, the second target entry is updated based on the first target entry; Alternatively, in response to the fact that no second target entry corresponding to the previous address is found in the historical sample table, a third target entry is determined in the historical sample table and the third target entry is updated based on the first target entry; The historical sample table includes the previous address, spatial base address, spatial bit vector, stream length, recurrence bit, confidence level, access interval, and first offset.

16. The method according to claim 15, characterized in that, Updating the second target entry based on the first target entry includes: Determine whether to receive the computed address; the computed address is determined based on the previous address and the nearest offset. In response to the failure to receive a computed address, a first offset value is determined based on the first target entry, and it is determined whether the spatial bit vector in the second target entry corresponds to the first offset value; If the spatial bit vector in the second target entry corresponds to the first offset value, then increase the recurrence bit and confidence level in the second target entry; Alternatively, if the spatial bit vector in the second target entry does not correspond to the first offset value, then the flow length and recurrence bit in the second target entry are increased, and the spatial bit vector is updated based on the first offset value.

17. The method according to claim 16, characterized in that, The method further includes: In response to receiving the computed address, the access interval in the second target entry is increased; Adjust the confidence level and recurrence position.

18. The method according to claim 16 or 17, characterized in that, The method further includes: Update the reuse confidence in the current table of the spatiotemporal index based on the recurrence value and confidence value in the second target entry; The spatiotemporal index is updated based on the number of the first element in the spatial bit vector to update the pattern confidence in the current table.

19. The method according to claim 1, characterized in that, The step of determining at least one prefetch address corresponding to each data processing request based on at least one of the prefetch mode, program counter identifier, training address, and prefetch degree for each data processing request includes: The temporary record table is retrieved based on the program counter and training address corresponding to the data request. In response to the retrieval of a fourth target entry in the temporary record table, at least one prefetch address is determined based on the prefetch degree; Alternatively, in response to the failure to retrieve the fourth target entry in the temporary record table, the spatiotemporal pattern table or time pattern table in the last-level cache is determined based on the prefetch pattern. Based on the prefetch degree, program counter identifier, and training address, at least one prefetch address is determined from the spatiotemporal pattern table or the time pattern table.

20. The method according to claim 1, characterized in that, The method also includes one of the following: Based on multiple data processing requests corresponding to the same process, update the time pattern table and the spatiotemporal pattern table according to the prefetching mode; Update the spatiotemporal pattern table and time pattern table based on entries in the current table or the spatiotemporal index.

21. The method according to claim 20, characterized in that, The method further includes: In response to a new entry in the spatiotemporal pattern table in the last-level cache, the value of the first update counter is incremented; In response to a new entry in the time pattern table in the last-level cache, the value of the second update counter is incremented; In response to a replacement entry in the spatiotemporal pattern table in the last-level cache, the value of the first replacement counter is incremented; In response to a replacement entry in the time pattern table in the last-level cache, the value of the second replacement counter is incremented; The spacetime pattern table and time pattern table are adjusted in terms of capacity ratio in the last-level cache based on the values ​​of the first update counter, the second update counter, the first replacement counter, and the second replacement counter.

22. The method according to claim 1, characterized in that, After determining at least one prefetch address corresponding to each data processing request based on the prefetch mode, training address, and prefetch degree, the method further includes: Record the number of prefetch addresses issued; Record the number of response prefetch addresses in the last-level cache; Record the number of received prefetch hits; Record the number of data missing requests received; The prefetch accuracy is determined based on the number of prefetch hits and the number of prefetch addresses issued. The prefetch percentage is determined based on the number of prefetch hits and the number of prefetch addresses for the last-level cache response; Determine the prefetch coverage based on the number of prefetch hits and the number of requests with missing data; The prefetching degree is adjusted based on prefetching accuracy, prefetching percentage, and prefetching coverage.

23. The method according to claim 22, characterized in that, The adjustment of prefetching degree based on prefetching accuracy, prefetching percentage, and prefetching coverage includes one of the following: If the prefetch accuracy is greater than the first adjustment threshold and the prefetch coverage is less than the second adjustment threshold, then the prefetch degree is increased. If the prefetch accuracy is less than the third adjustment threshold, the prefetch degree is reduced, and the capacity ratio of the time pattern table and the space-time pattern table in the last-level cache is adjusted based on the prefetch accuracy and the prefetch ratio.

24. A data processing apparatus, characterized in that, The device includes: The storage unit is used to determine the time pattern table and the spatiotemporal pattern table based on multiple data processing requests sent from multiple processor cores; The monitoring and adjustment unit is used to merge the threads corresponding to at least one data processing request sent by multiple processor cores based on the type of processor core, so as to obtain at least one process; The spatiotemporal stream index unit is used to determine at least one lookup entry in the spatiotemporal index table based on the process identifier, and to determine the prefetching mode corresponding to each data processing request based on the program counter identifier and the at least one lookup entry. The determining unit is used to determine at least one prefetch address corresponding to each data processing request from a time pattern table or a spatiotemporal pattern table based on at least one of the prefetch pattern, program counter identifier, training address and prefetch degree corresponding to each data processing request. The data processing request includes a data missing request or a data prefetch request, and the prefetch degree includes the number of prefetch addresses corresponding to each training address; the time pattern table includes a program counter identifier, a training address identifier, a prediction address identifier, a reuse count, and a confidence level; the spatiotemporal pattern table includes a program counter identifier, a training address identifier, a spatial base address identifier, a spatial bit vector, a confidence level, and a reuse count; the spatial bit vector is used to represent the identifier of the repeatedly accessed spatial sub-address in the space corresponding to the spatial base address.

25. An electronic device, characterized in that, include: At least one processor; And a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-23.

26. A non-transitory computer-readable storage medium storing computer instructions, characterized in that, The computer instructions are used to cause the computer to perform the method according to any one of claims 1-23.