A processor core, a data reading method, a chip and a computer device
By introducing a Level 1 hybrid cache and a dynamic priority arbitration mechanism into the processor core, the problem of balancing data access performance and hardware area is solved, which improves the efficiency of constant data access and reduces the chip area, thereby enhancing the overall performance of the processor core.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- KUNLUNXIN TECHNOLOGY (BEIJING) CO LTD
- Filing Date
- 2026-03-30
- Publication Date
- 2026-06-30
AI Technical Summary
In general-purpose graphics processors, how to balance data access performance and hardware area, especially how to optimize the access efficiency of constant data and reduce chip area without increasing additional hardware overhead.
By introducing a first-level hybrid cache in the processor core to uniformly store instructions and constant data, and by adopting a collaborative design of the instruction fetch unit, instruction decoding unit and constant addressing unit, efficient access to constant data is achieved, avoiding the hardware overhead of independent constant memory, and data access is optimized through multi-level caching and dynamic priority arbitration mechanism.
Without increasing the hardware area, it significantly improves the efficiency of constant data access, reduces the chip area, and improves the overall performance and parallel execution efficiency of the processor core.
Smart Images

Figure CN122308917A_ABST
Abstract
Description
Technical Field
[0001] This disclosure relates to the field of computer technology, specifically the field of chip technology. Specifically, it relates to a processor core, a data reading method, a chip, and a computer device. Background Technology
[0002] In general-purpose graphics processing units (GPGPUs), in order to reduce repeated access to some data, avoid bandwidth waste, and provide low-latency access to read-only data, constant memory hardware is usually added to store this type of constant data. However, this increases the chip hardware area due to the addition of extra hardware components and control logic.
[0003] Balancing data access performance with hardware footprint is a critical issue that the industry urgently needs to address. Summary of the Invention
[0004] This disclosure provides a processor core, a data reading method, a chip, and a computer device.
[0005] According to one aspect of this disclosure, a processor core is provided, comprising: The first-level hybrid cache is used to uniformly store instructions and constant data; The instruction fetch unit is used to read the current instruction from the first-level hybrid cache and forward it to the instruction decoding unit; The instruction decoding unit is used to parse the operation type of the current instruction, and when the current instruction is a constant instruction, to initiate a constant access request to the constant addressing unit; The constant addressing unit is used to calculate the current memory address based on the constant access request, and read constant data from the first-level hybrid cache based on the current memory address.
[0006] According to another aspect of this disclosure, a data reading method is provided, executed by a processor core, the processor core including a first-level hybrid cache, an instruction fetch unit, an instruction decoding unit, and a constant addressing unit, wherein the first-level hybrid cache is used to uniformly store instructions and constant data; the method includes: The instruction fetching unit reads the current instruction from the first-level hybrid cache and forwards it to the instruction decoding unit; The instruction decoding unit parses the operation type of the current instruction, and when the current instruction is a constant instruction, it initiates a constant access request to the constant addressing unit; The constant addressing unit calculates the current memory address based on the constant access request, and reads constant data from the first-level hybrid cache based on the current memory address.
[0007] According to another aspect of this disclosure, a chip is provided that includes a processor core provided in any embodiment of this disclosure.
[0008] According to another aspect of this disclosure, a computer device is provided that includes a processor core provided in any embodiment of this disclosure.
[0009] It should be understood that the description in this section is not intended to identify key or essential features of the embodiments of this disclosure, nor is it intended to limit the scope of this disclosure. Other features of this disclosure will become readily apparent from the following description. Attached Figure Description
[0010] The accompanying drawings are provided to better understand this solution and do not constitute a limitation of this disclosure. Wherein: Figure 1a This is a schematic diagram of the structure of a processor core according to an embodiment of the present disclosure; Figure 1b This is a schematic diagram of another processor core structure provided according to an embodiment of the present disclosure; Figure 2 This is a flowchart of a data reading method provided according to an embodiment of the present disclosure; Figure 3 This is a block diagram of a chip provided according to an embodiment of the present disclosure; Figure 4 This is a block diagram of a computer device provided according to an embodiment of the present disclosure. Detailed Implementation
[0011] The exemplary embodiments of this disclosure are described below with reference to the accompanying drawings, including various details of the embodiments to aid understanding, and should be considered merely exemplary. Therefore, those skilled in the art will recognize that various changes and modifications can be made to the embodiments described herein without departing from the scope and spirit of this disclosure. Similarly, for clarity and brevity, descriptions of well-known functions and structures are omitted in the following description.
[0012] In the GPGPU architecture, the implementation schemes for programs to obtain constant data are mainly divided into the following two categories: First, at the hardware level, a separate constant storage unit is not allocated, but constant data is uniformly mapped to the global memory space, and all cores complete the data reading through global memory access instructions when accessing constants; Second, in order to optimize access efficiency, an independent constant memory is added to the hardware architecture, which is specifically used to store read-only constant data with high access frequency.
[0013] The first approach reuses global memory space to store constant data, effectively avoiding the area overhead of dedicated hardware. However, the high access latency of global memory significantly reduces the efficiency of constant data retrieval, making it a performance bottleneck, especially in computationally intensive tasks. The second approach introduces dedicated constant memory and accompanying control logic, compressing constant access latency to near register access levels. However, this requires additional chip area and necessitates the design of complex cache coherence protocols to avoid redundant data storage, limiting its application in area-sensitive scenarios.
[0014] Figure 1a This is a schematic diagram of a processor core structure according to an embodiment of the present disclosure. This embodiment is applicable to situations where constant data is stored in physical storage space that reuses a hybrid cache. Combined with... Figure 1a and Figure 1b The processor core of this embodiment includes: a first-level hybrid cache 101 for uniformly storing instructions and constant data; an instruction fetch unit 102 for reading the current instruction from the first-level hybrid cache and forwarding it to an instruction decoding unit 103; the instruction decoding unit 103 for parsing the operation type of the current instruction, and when the current instruction is a constant instruction, initiating a constant access request to a constant addressing unit 104; the constant addressing unit 104 for calculating the current memory address based on the constant access request, and reading constant data from the first-level hybrid cache 101 based on the current memory address.
[0015] In this embodiment, the first-level hybrid cache 101 is used to uniformly store instructions and constant data. The physical space is divided by address range. The first-level hybrid cache 101 includes an instruction area and a constant data area. The instruction area is located below a preset address, and the constant data area is located above the preset address. The storage space of the instruction area is larger than that of the constant data area. The first-level hybrid cache 101 is bidirectionally connected to the instruction fetch unit 102, the constant addressing unit 104, and the second-level hybrid cache 200.
[0016] Both the Level 1 hybrid cache 101 and the Level 2 hybrid cache 200 employ an instruction cache (icache) hardware architecture. As a critical storage component within the processor core, the icache is specifically designed to cache recently executed instruction sequences. Given the common characteristics of instruction loops and repetitive execution during program execution, residing frequently accessed instructions in the instruction cache effectively reduces the frequency of processor access to main memory, thereby lowering average memory access latency and improving instruction execution throughput. In its implementation, the icache is typically built upon high-speed static random-access memory (SRAM), which offers an order-of-magnitude advantage in access speed compared to main memory composed of dynamic random-access memory (DRAM).
[0017] The instruction fetch unit 102 fetches instructions from the Level 1 hybrid cache 101. Its output port is connected to the input port of the instruction decode unit 103, forwarding the fetched instructions to the decode unit 103. The output port of the instruction decode unit 103 is connected to the input port of the constant addressing unit 104. The functions of the instruction decode unit 103 include: resolving whether the operation type of the instruction is constant or non-constant; when it is identified as a constant instruction, extracting the constant identifier (such as the immediate field or symbolic index), and sending a constant access request carrying the constant identifier to the constant addressing unit 104. In the processor instruction set, the immediate field is a fixed-width area used to directly store constant values in the instruction encoding, and is one of the components of the instruction. The constant addressing unit 104, as a dedicated unit for constant data, calculates the current memory address of the corresponding constant data based on the constant identifier in the constant access request, and initiates a constant read request to the Level 1 hybrid cache 101 based on the current memory address, and receives the constant data fed back by the Level 1 hybrid cache 101.
[0018] By sharing the physical storage space of the Level 1 hybrid cache 101 with constant data and instructions, no separate hardware constant memory is needed. Data types are distinguished only by address range, achieving storage space reuse. Furthermore, through the collaborative design of the constant addressing unit 104 and the Level 1 hybrid cache 101, unified cache management of instructions and constant data is achieved. This solution replaces traditional independent constant memory, avoiding additional hardware overhead and thus reducing chip area.
[0019] The Load Constant Unit (LCU) can be obtained by refactoring the Scalar Arithmetic Logic Unit (SALU). In related technologies, the SALU unit only supports register-level arithmetic / logic operations, and its data access path is limited to register files. Constant data needs to be obtained through general storage paths such as global memory, resulting in high latency and bandwidth bottlenecks. The SALU unit is only responsible for writing the obtained data back to the register file and cannot initiate constant retrieval requests to the first-level hybrid cache. This embodiment of the present disclosure upgrades the SALU unit into a dedicated unit with constant addressing capabilities by refactoring its data processing logic and data path. This allows the constant addressing unit to not only write data to the register file but also calculate constant memory addresses and initiate precise constant read requests to the first-level hybrid cache based on these addresses. Therefore, the hardware area of the constant addressing unit is much smaller than that of constant memory. Through the collaborative optimization of the constant addressing unit and the first-level hybrid cache, the latency of constant data access is compressed to near the register access level while maintaining a relatively constant hardware area overhead.
[0020] In one alternative implementation, the processor core further includes a register file; the constant addressing unit is also configured to synchronously write the constant data into the registers corresponding to the multiple threads requesting the constant data in the thread bundle.
[0021] refer to Figure 1b The processor core also includes a register file 105. After obtaining the constant data fed back from the first-level hybrid cache 101, the constant addressing unit 104 is used to determine the active threads in the thread warp, and then filter out the threads in the thread warp that request the constant data. If multiple threads request the constant data, the constant addressing unit 104 writes the constant data into the registers corresponding to multiple threads through a broadcast mechanism, thereby improving parallel execution efficiency. For example, when all 32 threads in the thread warp request the constant data, the constant addressing unit 104 broadcasts the constant data to the registers of each thread, avoiding repeated instruction fetching and cache access.
[0022] In one optional implementation, the instruction decoding unit is further configured to forward the current instruction to the execution unit when the current instruction is a non-constant instruction, so that the execution unit can process the instruction.
[0023] In this embodiment of the disclosure, the processor core further includes an execution unit (not shown), which may be an Arithmetic Logic Unit (ALU) or a Floating Point Unit (FPU), etc. The output port of the instruction decoding unit 103 is also connected to the input port of the execution unit (not shown). When the current instruction is a constant instruction, the instruction decoding unit 103 forwards the current instruction to the execution unit, which then processes the instruction.
[0024] In one optional implementation, the first-level hybrid cache is further configured to: in response to an instruction read request obtained from the instruction fetch unit and a constant read request obtained from the constant addressing unit, obtain the number of instructions currently cached by the instruction fetch unit; and select and execute a request to be executed from the instruction read request and the constant read request according to the number of instructions.
[0025] In one optional implementation, the first-level hybrid cache is specifically used to: if the number of instructions in the current cache is less than a preset threshold, execute the instruction read request first and feed the instruction back to the instruction fetch unit; if the number of instructions in the current cache is equal to or greater than the preset threshold, execute the constant read request first and feed constant data back to the constant addressing unit.
[0026] When the first-level hybrid cache 101 simultaneously receives instruction fetch requests from the instruction fetch unit 102 and constant read requests from the constant addressing unit 104, a priority arbitration mechanism is needed to determine the execution order of the requests, and the higher-priority request is processed first. For example, under normal load scenarios, the instruction fetch request has a higher priority than the constant read request, and in this case, the output port of the first-level hybrid cache 101 feeds back the instruction to the instruction fetch unit 102 first.
[0027] Optionally, the first-level hybrid cache 101 can obtain the number of instructions currently cached by the instruction fetch unit 102 in real time as the arbitration basis; if the number of instructions currently cached is less than a preset threshold (e.g., 4 instructions), the instruction read request has a higher priority, and the output port of the first-level hybrid cache 101 will prioritize feeding back instructions to the instruction fetch unit 102; if the number of instructions currently cached is equal to or greater than the preset threshold, the constant read request has a higher priority, and the output port of the first-level hybrid cache 101 will prioritize feeding back constant data to the constant addressing unit 104.
[0028] In typical computational workloads such as large model inference, only a few operators, such as Softmax bias and Top-K parameters, rely on constant data, while the vast majority of operators are instruction operators that do not rely on constant data (which can be called non-constant operators). If a separate constant memory design is used, it will not only increase chip area overhead, but also lead to low bandwidth utilization due to the low access efficiency of constant memory. The embodiments of this disclosure implement constant storage function by reusing the storage resources of the first-level hybrid cache, and adopt a dynamic priority arbitration mechanism—instruction read requests take precedence by default, and constant read requests only take precedence when the number of instructions in the current cache is equal to or greater than a preset threshold. This optimizes the efficiency of constant data access while ensuring the continuity of the instruction pipeline, and ultimately improves the overall processing performance.
[0029] When the first-level hybrid cache 101 only receives constant read requests from the constant addressing unit 104, the output port of the first-level hybrid cache 101 feeds back constant data to the constant addressing unit 104.
[0030] In one optional implementation, when executing the instruction read request, the first-level hybrid cache is further configured to: determine whether the first-level hybrid cache stores the constant data required by the constant read request; if not, prefetch the constant data from the second-level hybrid cache or external memory and write it into the first-level hybrid cache.
[0031] To further improve the efficiency of constant data access, embodiments of this disclosure also introduce a constant prefetching mechanism. (See reference...) Figure 1b When the first-level hybrid cache 101 processes an instruction read request, i.e., when its output port feeds back instruction data to the instruction fetch unit 102, the first-level hybrid cache 101 synchronously performs the following operations: checks whether the constant data required for subsequent constant read requests is already cached locally; if not, it triggers a prefetch mechanism to load the required constant data from the second-level hybrid cache 200 or external memory 300 and write it to the first-level hybrid cache 101; if it is already stored locally, it skips the prefetch process. By prefetching the constant data required subsequently through this mechanism, subsequent constant read requests can directly obtain constant data from the first-level hybrid cache 101, thereby significantly improving the efficiency of constant data access.
[0032] In one optional implementation, when the constant read request is executed first, the first-level hybrid cache is specifically used to: determine whether the constant data required by the constant read request is stored in the first-level hybrid cache; if not, obtain the required constant data from the second-level hybrid cache or external memory, write it into the first-level hybrid cache, and then feed it back to the constant addressing unit.
[0033] refer to Figure 1bThe first-level hybrid cache 101 is also communicatively connected to the second-level hybrid cache 200, which in turn is communicatively connected to the external memory 300, forming a hierarchical storage structure. When processing a constant read request, the first-level hybrid cache 101 checks whether the required constant data is already stored locally. If not, it sequentially queries the second-level hybrid cache 200 and the external memory 300 via a cascading access mechanism, fills the obtained constant data back into the first-level hybrid cache 101, and then feeds it back to the constant addressing unit 104. Similarly, when processing an instruction read request, if the required instruction is not stored, it retrieves the required instruction from the upper-level storage via a cascading access mechanism, fills it back into the first-level hybrid cache 101, and then feeds it back to the instruction fetch unit 102. By introducing a multi-level caching mechanism, data access efficiency is ensured, further improving the resource utilization of the processor core.
[0034] Furthermore, for modules outside the processor core that need to access constant data, the required data can be obtained by directly accessing the external L2 hybrid cache 200, thus bypassing the L1 hybrid cache access path inside the processor core. This design can effectively reduce the bandwidth pressure on the L1 hybrid cache, avoid memory access blockage of core computing resources caused by frequent access to constant data by external modules, and maintain the low latency characteristics of the data path inside the processor core.
[0035] The technical solution provided in this disclosure achieves multiple optimizations through a unified cache architecture and dynamic resource management mechanism: it adopts a storage reuse design that shares a first-level hybrid cache for instructions and constant data, eliminating the hardware overhead of independent constant memory and significantly reducing chip area; it dynamically balances instruction pipeline and constant access requirements through a priority arbitration mechanism, optimizing constant data hit rate while ensuring instruction continuity; and it introduces a multi-level cache hierarchy and prefetch strategy, combined with a broadcast mechanism to achieve efficient distribution of constant data, which improves parallel execution efficiency and reduces bandwidth contention.
[0036] In one optional implementation, the first-level hybrid cache includes a main table entry and a secondary table entry. The write strategy for the main table entry and the secondary table entry is as follows: new instructions and new constant data are written to the main table entry first; after the main table entry is full, new instructions and new constant data are written to the secondary table entry.
[0037] To further improve the performance of the first-level hybrid cache and reduce storage overhead, a secondary entry is added to the main entry used for storing instructions and constant data. The main entry's storage space is configured to be larger than that of the secondary entry, while the secondary entry has a fixed capacity (e.g., 128 bits). In terms of data writing strategy, new instructions and new constant data are preferentially written to the main entry. Only when the main entry is full (reaching its capacity limit) are subsequent new instructions and new constant data written to the secondary entry. Considering that the computation task startup phase (kernel initialization phase) is primarily focused on instruction loading, the main entry is usually filled with instruction data first, while the secondary entry mainly carries constant data. Given that constant data has read-only characteristics and is frequently accessed, this layered design effectively ensures the continuity of core instructions while optimizing the access efficiency of constant data through the principle of locality.
[0038] In one optional implementation, the write priority of the main table entry and the auxiliary table entry is as follows: when writing data to the main table entry, the priority of new instructions is higher than that of new constant data; when writing data to the auxiliary table entry, the priority of new constant data is higher than that of new instructions.
[0039] In one optional implementation, when both the main entry and the auxiliary entry are full, the overwrite strategy for the main entry and the auxiliary entry is as follows: new instructions take priority over old instructions in the main entry; new constant data takes priority over old constant data in the auxiliary entry.
[0040] For main entries, new instructions have higher priority than new constant data. When the available capacity of the main entry is limited, new instructions are given priority to be written to the main entry. For auxiliary entries, new constant data has higher priority than new instructions. When the available capacity of the auxiliary entry is limited, new constant data is given priority to be written to the auxiliary entry.
[0041] When both the primary and secondary cache entries reach their capacity limits, new instructions prioritize overwriting older instructions with lower timeliness in the primary cache entries; new constant data prioritizes over expired or infrequently accessed older constant data in the secondary cache entries. Through differentiated priority allocation and intelligent replacement strategies for primary and secondary cache entries, the continuity of the instruction flow is ensured to improve execution efficiency, while locality optimization of constant data reduces storage access latency, thus achieving coordinated optimization of computational performance and storage efficiency with limited cache resources.
[0042] Figure 2 This is a flowchart of a data reading method according to an embodiment of the present disclosure. This embodiment is applicable to the situation where the physical storage space of a hybrid cache is reused to store constant data. The data reading method of this embodiment is executed by a processor core. The processor core includes a first-level hybrid cache, an instruction fetch unit, an instruction decoding unit, and a constant addressing unit. The first-level hybrid cache is used to uniformly store instructions and constant data; such as... Figure 2As shown, the data reading method includes: S201, the instruction fetching unit reads the current instruction from the first-level hybrid cache and forwards it to the instruction decoding unit; S202, the instruction decoding unit parses the operation type of the current instruction, and when the current instruction is a constant instruction, it initiates a constant access request to the constant addressing unit; S203, the constant addressing unit calculates the current memory address based on the constant access request, and reads constant data from the first-level hybrid cache based on the current memory address.
[0043] In one alternative implementation, the data reading method further includes: The first-level hybrid cache responds to an instruction read request obtained from the instruction fetch unit and a constant read request obtained from the constant addressing unit, and obtains the number of instructions currently cached by the instruction fetch unit; The first-level hybrid cache selects and executes requests from the instruction read requests and the constant read requests based on the number of instructions.
[0044] In one optional implementation, the first-level hybrid cache selects and executes requests to be executed from the instruction read requests and the constant read requests based on the number of instructions, including: If the number of instructions currently cached is less than a preset threshold, the first-level hybrid cache will prioritize executing the instruction read request and feed the instruction back to the instruction fetching unit; If the number of instructions currently cached is equal to or greater than the preset threshold, the first-level hybrid cache will prioritize the execution of the constant read request and feed back constant data to the constant addressing unit.
[0045] In one optional implementation, the data reading method further includes, when executing the instruction read request: The first-level hybrid cache determines whether the first-level hybrid cache stores the constant data required by the constant read request; If not stored, the first-level hybrid cache prefetches the constant data from the second-level hybrid cache or external memory and writes it into the first-level hybrid cache.
[0046] In one alternative implementation, when the constant read request is executed first, the data read method further includes: The first-level hybrid cache determines whether the constant data required by the constant read request is stored in the first-level hybrid cache; If not stored, the first-level hybrid cache retrieves the required constant data from the second-level hybrid cache or external memory, writes it into the first-level hybrid cache, and then feeds it back to the constant addressing unit.
[0047] In one alternative implementation, the processor core further includes a register file; the data reading method further includes: The constant addressing unit synchronously writes the constant data into the registers corresponding to the multiple threads in the thread bundle that request the constant data.
[0048] In one alternative implementation, the data reading method further includes: When the current instruction is a non-constant instruction, the instruction decoding unit forwards the current instruction to the execution unit, which then processes the instruction.
[0049] In one optional implementation, the first-level hybrid cache includes a main table entry and a secondary table entry, and the write strategy for the main table entry and the secondary table entry is as follows: New instructions and new constant data are written to the main table entries first. After the main table entry is full, new instructions and new constant data are written to the auxiliary table entry.
[0050] In one optional implementation, the write priority of the main table entry and the auxiliary table entry is as follows: When writing data to the main table entry, the new instruction has higher priority than the new constant data; When writing data to the auxiliary table entry, the new constant data has higher priority than the new instruction.
[0051] In one optional implementation, when both the main table entry and the auxiliary table entry are full, the overwrite strategy for the main table entry and the auxiliary table entry is as follows: New instructions take precedence over old instructions in the main table entries; New constant data will take precedence over old constant data in auxiliary table entries.
[0052] The data reading method provided in this disclosure refactors the hardware architecture design, entry information, data path, and control logic of the instruction cache (icache) to reconstruct the original instruction storage entry into a main entry supporting mixed storage of instructions and constant data. Simultaneously, to improve the efficiency of constant data reading, an auxiliary entry with high-priority storage characteristics for constants is added to the first-level mixed cache, and the access paths of other modules are expanded to enable direct access to constant data. Furthermore, by refactoring the SALU unit into a constant addressing unit, it supports constant memory address access, ensuring efficient hardware data acquisition capabilities while avoiding additional area overhead.
[0053] According to embodiments of this disclosure, this disclosure also provides a chip. For example... Figure 3 As shown, the chip includes the processor core provided in any embodiment of the present disclosure.
[0054] According to embodiments of this disclosure, this disclosure also provides a computer device. For example... Figure 4 As shown, the computer device includes a processor core provided in any embodiment of the present disclosure.
[0055] It should be understood that the various forms of processes shown above can be used to reorder, add, or delete steps. For example, the steps described in this disclosure can be executed in parallel, sequentially, or in different orders, as long as the desired result of the technical solution disclosed in this disclosure can be achieved, and this is not limited herein.
[0056] The specific embodiments described above do not constitute a limitation on the scope of protection of this disclosure. Those skilled in the art should understand that various modifications, combinations, sub-combinations, and substitutions can be made according to design requirements and other factors. Any modifications, equivalent substitutions, and improvements made within the spirit and principles of this disclosure should be included within the scope of protection of this disclosure.
Claims
1. A processor core, comprising: The first-level hybrid cache is used to uniformly store instructions and constant data; The instruction fetch unit is used to read the current instruction from the first-level hybrid cache and forward it to the instruction decoding unit; The instruction decoding unit is used to parse the operation type of the current instruction, and when the current instruction is a constant instruction, to initiate a constant access request to the constant addressing unit; The constant addressing unit is used to calculate the current memory address based on the constant access request, and read constant data from the first-level hybrid cache based on the current memory address.
2. The processor core according to claim 1, wherein the first-level hybrid cache is further configured to: In response to an instruction fetch request obtained from the instruction fetch unit and a constant fetch request obtained from the constant addressing unit, the number of instructions currently cached by the instruction fetch unit is obtained; Based on the number of instructions, select the request to be executed from the instruction read request and the constant read request and execute it.
3. The processor core according to claim 2, wherein, The first-level hybrid cache is specifically used for: If the number of instructions currently cached is less than a preset threshold, the instruction read request is executed first, and the instruction is fed back to the instruction fetching unit. If the number of instructions currently cached is equal to or greater than the preset threshold, the constant read request is executed first, and constant data is fed back to the constant addressing unit.
4. The processor core according to claim 3, wherein, When executing the instruction read request, the first-level hybrid cache is also used for: Determine whether the first-level hybrid cache stores the constant data required by the constant read request; If not stored, the constant data is prefetched from the second-level hybrid cache or external memory and written to the first-level hybrid cache.
5. The processor core according to claim 3, wherein, When prioritizing the execution of the constant read request, the first-level hybrid cache is specifically used for: Determine whether the first-level hybrid cache stores the constant data required by the constant read request; If not stored, the required constant data is retrieved from the second-level hybrid cache or external memory, written into the first-level hybrid cache, and then fed back to the constant addressing unit.
6. The processor core according to claim 1, wherein the processor core further includes a register file; The constant addressing unit is also used to synchronously write the constant data into the registers corresponding to the multiple threads in the thread bundle that request the constant data.
7. The processor core according to claim 1, wherein the instruction decoding unit is further configured to forward the current instruction to the execution unit when the current instruction is a non-constant instruction, so that the execution unit can process the instruction.
8. The processor core according to claim 1, wherein, The first-level hybrid cache includes a main table entry and a secondary table entry, and the write strategy for the main table entry and the secondary table entry is as follows: New instructions and new constant data are written to the main table entries first. After the main table entry is full, new instructions and new constant data are written to the auxiliary table entry.
9. The processor core according to claim 8, wherein, The write priority of the main table entry and the auxiliary table entry is as follows: When writing data to the main table entry, the new instruction has higher priority than the new constant data; When writing data to the auxiliary table entry, the new constant data has higher priority than the new instruction.
10. The processor core according to claim 8, wherein, When both the main table entry and the auxiliary table entry are full, the overwrite strategy for the main table entry and the auxiliary table entry is as follows: New instructions take precedence over old instructions in the main table entries; New constant data will take precedence over old constant data in auxiliary table entries.
11. A data reading method, executed by a processor core, the processor core including a first-level hybrid cache, an instruction fetch unit, an instruction decoding unit, and a constant addressing unit, wherein the first-level hybrid cache is used to uniformly store instructions and constant data; the method includes: The instruction fetching unit reads the current instruction from the first-level hybrid cache and forwards it to the instruction decoding unit; The instruction decoding unit parses the operation type of the current instruction, and when the current instruction is a constant instruction, it initiates a constant access request to the constant addressing unit; The constant addressing unit calculates the current memory address based on the constant access request, and reads constant data from the first-level hybrid cache based on the current memory address.
12. The method according to claim 11, further comprising: The first-level hybrid cache responds to an instruction read request obtained from the instruction fetch unit and a constant read request obtained from the constant addressing unit, and obtains the number of instructions currently cached by the instruction fetch unit; The first-level hybrid cache selects and executes requests from the instruction read requests and the constant read requests based on the number of instructions.
13. The method according to claim 12, wherein, The first-level hybrid cache selects and executes requests from the instruction read requests and the constant read requests based on the number of instructions, including: If the number of instructions currently cached is less than a preset threshold, the first-level hybrid cache will prioritize executing the instruction read request and feed the instruction back to the instruction fetching unit; If the number of instructions currently cached is equal to or greater than the preset threshold, the first-level hybrid cache will prioritize the execution of the constant read request and feed back constant data to the constant addressing unit.
14. The method according to claim 13, wherein when executing the instruction read request, the method further comprises: The first-level hybrid cache determines whether the first-level hybrid cache stores the constant data required by the constant read request; If not stored, the first-level hybrid cache prefetches the constant data from the second-level hybrid cache or external memory and writes it into the first-level hybrid cache.
15. The method according to claim 13, wherein, When prioritizing the execution of the constant read request, the method further includes: The first-level hybrid cache determines whether the constant data required by the constant read request is stored in the first-level hybrid cache; If not stored, the first-level hybrid cache retrieves the required constant data from the second-level hybrid cache or external memory, writes it into the first-level hybrid cache, and then feeds it back to the constant addressing unit.
16. The method according to claim 11, wherein the processor core further includes a register file; the method further includes: The constant addressing unit synchronously writes the constant data into the registers corresponding to the multiple threads in the thread bundle that request the constant data.
17. The method according to claim 11, further comprising: When the current instruction is a non-constant instruction, the instruction decoding unit forwards the current instruction to the execution unit, which then processes the instruction.
18. The method according to claim 11, wherein, The first-level hybrid cache includes a main table entry and a secondary table entry, and the write strategy for the main table entry and the secondary table entry is as follows: New instructions and new constant data are written to the main table entries first. After the main table entry is full, new instructions and new constant data are written to the auxiliary table entry.
19. The method according to claim 18, wherein, The write priority of the main table entry and the auxiliary table entry is as follows: When writing data to the main table entry, the new instruction has higher priority than the new constant data; When writing data to the auxiliary table entry, the new constant data has higher priority than the new instruction.
20. The method according to claim 18, wherein, When both the main table entry and the auxiliary table entry are full, the overwrite strategy for the main table entry and the auxiliary table entry is as follows: New instructions take precedence over old instructions in the main table entries; New constant data will take precedence over old constant data in auxiliary table entries.
21. A chip comprising a processor core as claimed in any one of claims 1-10.
22. A computer device comprising a processor core as claimed in any one of claims 1-10.