Multi-core processor and method for allocating system shared resources
By dynamically allocating shared system resources in multi-core processors and optimizing based on the thread importance of processor cores, the latency and power consumption waste caused by asynchronous data interaction are solved, achieving more efficient power management and speed improvement.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- HUAWEI TECH CO LTD
- Filing Date
- 2025-12-16
- Publication Date
- 2026-06-25
AI Technical Summary
In existing multi-core processors, asynchronous data interaction increases data interaction latency, reduces the operating speed of electronic products, and existing power regulation technologies lead to power waste.
By setting up a controller in a multi-core processor, shared system resources are dynamically allocated based on the importance of threads in the processor core. This allows cores with higher importance to occupy more resources, while cores with lower importance occupy fewer or no resources, enabling operation under the same power supply voltage and frequency.
It reduces the power consumption of multi-core processors, reduces the need for asynchronous communication, improves processor efficiency, and reduces latency.
Smart Images

Figure CN2025142764_25062026_PF_FP_ABST
Abstract
Description
Multi-core processors and methods for allocating shared system resources
[0001] This application claims priority to Chinese Patent Application No. 202411895752.9, filed on December 19, 2024, entitled "Multi-core processor and method for allocating shared resources of a system", the entire contents of which are incorporated herein by reference. Technical Field
[0002] This application relates to the field of computer technology, and in particular to a multi-core processor and a method for allocating shared system resources. Background Technology
[0003] With the continuous development of computer technology, the operating frequency and integration level of electronic products, such as integrated circuits, are constantly increasing, leading to a rapid increase in power consumption. Current technologies typically control power consumption by regulating the voltage and frequency of various logic chips (such as processor chips and AI chips) within the electronic product. Examples of power consumption control techniques include dynamic voltage and frequency scaling (DVFS). This technique usually involves dynamically adjusting the power supply voltage and clock frequency of each processor core within the logic chip to reduce power consumption.
[0004] In existing electronic products, to improve operating speed, a single logic chip typically integrates multiple processor cores, which work together to complete complex tasks. However, as mentioned above, various power consumption control technologies require electronic products to support data interaction between multiple processor cores with different clock frequencies, i.e., asynchronous interaction between the multiple processor cores. However, asynchronous data interaction increases data interaction latency and reduces the operating speed of electronic products. Summary of the Invention
[0005] The multi-core processor and method for allocating shared system resources provided in this application can reduce the power consumption of electronic devices while ensuring their operating speed. To achieve the above objectives, the embodiments of this application adopt the following technical solutions:
[0006] In a first aspect, embodiments of this application provide a multi-core processor, which includes: multiple processor cores, a controller, and system shared resources; multiple processor cores, each used to generate an indication signal and transmit the indication signal to the controller, wherein the indication signal is used to indicate the importance of the thread running in each processor core; and the controller, used to selectively allocate system shared resources to the multiple processor cores based on the indication signals of the multiple processor cores.
[0007] The multi-core processor provided in this application embodiment, by setting a controller, can dynamically allocate system shared resources to each processor core based on the importance level of the threads running in each processor core. For example, for high-priority processor cores, the processor core can be allowed to occupy more system shared resources; for low-priority processor cores, fewer system shared resources can be allocated to the processor core, or the processor core can be prohibited from occupying system shared resources. Thus, by dynamically allocating system shared resources, this application embodiment allows high-priority processor cores to execute tasks at lower voltages, and low-priority processor cores can also execute tasks at lower voltages. Therefore, the multi-core processor provided in this application embodiment can reduce the operating power consumption of the multi-core processor. In addition, since this application embodiment achieves the purpose of reducing power consumption by dynamically adjusting the system shared resources of the processor cores, compared with the prior art which achieves the purpose of reducing power consumption by providing different voltages and frequencies to different processor cores, this application embodiment can allow multiple processor cores to work under the same power supply voltage and / or the same clock frequency, thereby eliminating the need for asynchronous communication mechanisms, thereby reducing the processor's working latency and improving the processor's working efficiency.
[0008] Based on the first aspect, in one possible implementation, the system shared resources include at least one of the following: a shared cache, a predictor, an arbitration controller, or a shared execution unit. By setting multiple system shared resources, the controller can dynamically allocate more resources, thereby tilting more system shared resources toward processor cores of higher importance, further improving the operating efficiency of processor cores, and thus further reducing the minimum system voltage, which is beneficial for further reducing the power consumption of multi-core processors.
[0009] Based on the first aspect, in one possible implementation, each of the multiple processor cores is specifically used to: generate an indication signal based on at least one of the processor core's performance monitoring events and security context information. Performance monitoring events can be various events within the processor core, such as, but not limited to: software increment, instruction prefetching leading to instruction or cache refill, exception occurrence, data memory access, and data cache reclamation. The more performance monitoring events, the higher the processor core load and the greater the thread's importance. The security context is used to record whether a thread is switched to another task during execution. If the security context of a thread running on a processor core remains unchanged, it indicates that the thread is continuously executing the same task. The longer the security context remains unchanged, the longer the same task is executed, and the higher the thread's importance can be considered. Therefore, generating an indication signal based on at least one of performance monitoring events and security context makes the generated indication signal more accurate.
[0010] Based on the first aspect, in one possible implementation, each processor core includes a performance monitoring unit and an identification unit; wherein the performance monitoring unit is used to generate at least one of performance monitoring events and security context information; and the identification unit is used to generate an indication signal based on at least one of the performance monitoring events and security context information. By setting up a performance monitoring unit and an identification unit, and utilizing the existing performance monitoring unit in the processor core to generate at least one of the performance monitoring events and security context information, this application implementation only requires setting up an additional identification unit in the processor core, thereby allowing the processor core to generate indication signals at a relatively low cost.
[0011] Based on the first aspect, in one possible implementation, the identification unit includes a counter for counting performance monitoring events. Specifically, the identification unit generates an indication signal based on the counter's count within a preset period. By setting a counter within the identification unit, the number of performance monitoring events can be accurately counted; thus, the importance of a thread can be determined based on the counter's count within the preset period.
[0012] Based on the first aspect, in one possible implementation, the identification unit further includes a timer, which records the duration of the security context. Specifically, the identification unit generates an indication signal based on the timer's duration within a preset period. By setting a timer, the identification unit can accurately calculate the duration of the same security context; thus, the importance of a thread can be determined based on the timer's duration within the preset period.
[0013] Based on the first aspect, in one possible implementation, the identification unit is specifically used to: generate an indication signal based on the number of times the counter counts within a preset period and the duration of the timer within the preset period.
[0014] Based on the first aspect, in one possible implementation, the controller is specifically used to: allocate system shared resources to multiple processor cores in descending order of the importance of the running threads; wherein the amount of system cache resources allocated to the processor core with higher importance of the running threads is higher than the amount of system cache resources allocated to the processor core with lower importance of the running threads.
[0015] Based on the first aspect, in one possible implementation, the controller is specifically configured to: compare indication signals of multiple processor cores, select at least one processor core with the highest importance for the running thread based on the comparison result, and allocate some or all of the system shared resources to the selected at least one processor core. By comparing indication signals of multiple processor cores, the importance level differences among the multiple processor cores can be determined. Using these differences, at least one processor core with the highest importance can be selected, thereby allocating at least a portion of the system cache resources to that at least one processor core, making the allocation of system shared resources to the multiple processor cores more accurate.
[0016] Based on the first aspect, in one possible implementation, the controller is further specifically used to: generate resource configuration information for multiple processor cores based on indication signals of multiple processor cores, and provide the resource configuration information to system shared resources; the system shared resources are used to configure resources for the processor cores based on the resource configuration information.
[0017] Based on the first aspect, in one possible implementation, the resource configuration information includes at least one of the following: storage space allocation information of the shared cache, cache page replacement priority information in the shared cache, scheduling priority information of the processor core, speculative strategy information of the processor core, usage permission information of the predictor, occupancy time information of the processor core for signal transmission paths, or usage priority information of the shared execution unit.
[0018] Based on the first aspect, in one possible implementation, multiple processor cores have at least one of the same power domain and clock domain. The power domain can be, for example, voltage, and the clock domain can be, for example, clock frequency. By configuring multiple processor cores to have at least one of the same power domain and clock domain, asynchronous communication mechanisms can be eliminated, thereby reducing the operating latency of the multi-core processor and improving its efficiency.
[0019] Secondly, embodiments of this application provide a method for allocating system cache resources, applied to a multi-core processor, the multi-core processor including multiple processor cores, the method for allocating system cache resources includes: generating multiple indication signals, wherein each indication signal is used to indicate the importance of a thread running in a corresponding processor core; and selectively allocating system shared resources to multiple processor cores based on the multiple indication signals.
[0020] Based on the second aspect, in one possible implementation, the system shared resources include at least one of the following: a shared cache, a predictor, an arbitration controller, or a shared execution unit.
[0021] Based on the second aspect, in one possible implementation, generating multiple indication signals includes: generating multiple indication signals based on at least one of performance monitoring events of multiple processor cores and security context information.
[0022] Based on the second aspect, in one possible implementation, multiple indication signals are generated based on at least one of the performance monitoring events and security context information of multiple processor cores, including: generating multiple indication signals based on the number of times counters set in each of the multiple processor cores count within a preset period; wherein the counters are used to count the performance monitoring events corresponding to each processor core.
[0023] Based on the second aspect, in one possible implementation, an indication signal is generated based on at least one of the performance monitoring events of the processor core and security context information, including: generating multiple indication signals based on the duration of timers set in multiple processor cores within a preset period; wherein the timers are used to record the duration of the security context corresponding to each processor core.
[0024] Based on the second aspect, in one possible implementation, system shared resources are selectively allocated to multiple processor cores based on multiple indication signals, including: allocating system shared resources to multiple processor cores in descending order of the importance of the running threads; wherein the amount of system cache resources allocated to processor cores with higher importance of the running threads is higher than the amount of system cache resources allocated to processor cores with lower importance of the running threads.
[0025] Based on the second aspect, in one possible implementation, the system shared resources are selectively allocated to multiple processor cores based on multiple indication signals, including: comparing the indication signals of the multiple processor cores, selecting at least one processor core with the highest importance of the running thread based on the comparison result, and allocating some or all of the system shared resources to at least one processor core.
[0026] Based on the second aspect, in one possible implementation, allocating some or all of the system shared resources to at least one processor core includes: generating resource configuration information for multiple processor cores based on multiple indication signals, and providing the resource configuration information to the system shared resources.
[0027] Based on the second aspect, in one possible implementation, the resource configuration information includes at least one of the following: storage space allocation information of the shared cache, cache page replacement priority information in the shared cache, scheduling priority information of the processor core, speculative strategy information of the processor core, usage permission information of the predictor, occupancy time information of the processor core for signal transmission paths, or usage priority information of the shared execution unit.
[0028] Based on the second aspect, in one possible implementation, multiple processor cores have at least one of the same power domain and clock domain.
[0029] Thirdly, embodiments of this application also provide an electronic device, which includes a power management unit and a multi-core processor as described in the first aspect; wherein the power management unit supplies power to each processor core in the multi-core processor.
[0030] Fourthly, embodiments of this application also provide an apparatus for allocating system cache resources, including a processor and a memory, wherein the memory stores instructions, and when the processor executes the instructions in the memory, the apparatus for allocating system cache resources implements the method described in the third aspect above.
[0031] Fifthly, embodiments of this application also provide a chip, including: an input interface, an output interface, and a multi-core processor. Optionally, the chip further includes a memory. The multi-core processor is used to execute code in the memory, and when the multi-core processor executes the code, the chip implements the methods described in the second aspect, the third aspect, and / or any possible implementation thereof.
[0032] Alternatively, the chip described above can also be an integrated circuit.
[0033] Sixthly, embodiments of this application also provide a readable storage medium for storing instructions stored in the readable storage medium, which, when executed by a multi-core processor, cause the multi-core processor to perform the methods described in the second aspect and / or any possible implementation thereof.
[0034] In a seventh aspect, embodiments of this application also provide a computer program product, which includes a computer program that, when executed by a multi-core processor, causes the multi-core processor to implement the methods described in the second aspect and / or any possible implementation thereof.
[0035] It should be understood that the second to seventh aspects of this application are consistent with the technical solutions of the first aspect of this application, and the beneficial effects achieved by each aspect and the corresponding feasible implementation are similar, so they will not be described again. Attached Figure Description
[0036] Figure 1 is a schematic diagram of a processor architecture in the prior art provided in the embodiments of this application;
[0037] Figure 2 is a schematic diagram of the hardware structure of an electronic device provided in an embodiment of this application;
[0038] Figure 3 is a schematic diagram of the interaction relationship between various components in the electronic device provided in the embodiment of this application;
[0039] Figures 4A-4C are schematic diagrams of application scenarios corresponding to the interaction relationship shown in Figure 3, provided by embodiments of this application.
[0040] Figure 5 is a schematic diagram of a hardware structure of a processor core provided in an embodiment of this application;
[0041] Figure 6A is a schematic diagram of one implementation of the processor core generating an indication signal according to an embodiment of this application;
[0042] Figure 6B is a schematic diagram of another implementation of the processor core generating the indication signal provided in the embodiments of this application;
[0043] Figure 6C is a schematic diagram of another implementation of the processor core generating an indication signal provided in the embodiments of this application;
[0044] Figure 7 is a flowchart of a method for allocating system cache resources according to an embodiment of this application;
[0045] Figure 8 is a schematic diagram of a device for allocating system cache resources provided in an embodiment of this application. Detailed Implementation
[0046] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the embodiments of this application, and not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those of ordinary skill in the art without creative effort are within the protection scope of the embodiments of this application.
[0047] In this article, the term "and / or" is merely a description of the relationship between related objects, indicating that there can be three relationships. For example, A and / or B can represent three situations: A exists alone, A and B exist simultaneously, and B exists alone.
[0048] The terms "first" and "second," etc., in the specification and drawings of the embodiments of this application are used to distinguish different objects or to distinguish different treatments of the same object, rather than to describe a specific order of objects.
[0049] Furthermore, the terms "comprising" and "having," and any variations thereof, used in the description of the embodiments of this application are intended to cover non-exclusive inclusion. For example, a process, method, system, product, or device that includes a series of steps or units is not limited to the steps or units listed, but may optionally include other steps or units not listed, or may optionally include other steps or units inherent to these processes, methods, products, or devices.
[0050] It should be noted that in the description of the embodiments of this application, the words "exemplarily" or "for example" are used to indicate examples, illustrations, or explanations. Any embodiment or design scheme described as "exemplarily" or "for example" in the embodiments of this application should not be construed as being more preferred or advantageous than other embodiments or design schemes. Specifically, the use of the words "exemplarily" or "for example" is intended to present the relevant concepts in a specific manner.
[0051] In the description of the embodiments of this application, unless otherwise stated, "a plurality of" means two or more.
[0052] Please refer to Figure 1, which is a schematic diagram of a processor architecture in the prior art. As shown in Figure 1, a typical processor architecture includes a processor and an energy management unit (EMU). The processor integrates one or more processor cores; the figure schematically shows three processor cores. These three processor cores can be used to implement logical functions such as image signal processing, digital signal processing, or neural network processing. The energy management unit outputs voltage to the processor cores based on their operating performance (e.g., the processor core's power parameters). Furthermore, to reduce the power consumption of the processor architecture, the energy management unit dynamically adjusts the voltage and frequency of each processor core according to its operating conditions. That is, in the processor architecture shown in Figure 1, the operating voltage and frequency of the three processor cores may be different. When processor core 1 and processor core 2 interact with each other, since their operating frequencies may differ, the processor architecture shown in Figure 1 also requires an asynchronous communication mechanism to support data interaction between the two processor cores at different operating frequencies. Asynchronous data interaction typically increases the processor's latency and reduces its efficiency.
[0053] To reduce processor latency, based on the processor architecture shown in Figure 1, the industry has proposed using the same voltage to power all processor cores. That is, processor cores 1 through 3 all have the same voltage and clock frequency. When all processor cores operate at the same voltage and clock frequency, the system must maintain a minimum voltage to meet the operating requirements of all processor cores. If the minimum voltage of processor core 1 is high (indicating a high load), and the minimum voltage of the other processor cores is low, the other processor cores cannot reduce power consumption by lowering their voltage because the system supplies the same voltage to each processor core, resulting in wasted power.
[0054] In summary, it can be seen that existing technologies still suffer from poor processor core performance. The multi-core processor and method for allocating system cache resources provided in this application embodiment, by setting a controller, can dynamically allocate system shared resources to each processor core based on the importance level of the threads running in each processor core. For example, for high-priority processor cores, the processor core can be allowed to occupy more system shared resources; for low-priority processor cores, fewer system shared resources can be allocated to the processor core, or the processor core can be prohibited from occupying system shared resources. Thus, by dynamically allocating system shared resources, this application embodiment can enable high-priority processor cores to execute tasks at lower voltages, and low-priority processor cores can also execute tasks at lower voltages. Therefore, the multi-core processor provided in this application embodiment can reduce the operating power consumption of the multi-core processor. In addition, since this application embodiment achieves the purpose of reducing power consumption by dynamically adjusting the system shared resources of the processor cores, compared with the prior art which achieves the purpose of reducing power consumption by providing different voltages and frequencies to different processor cores, this application embodiment can ensure that multiple processor cores can work in real time under the same power supply voltage and / or the same clock frequency, thereby eliminating the need for asynchronous communication mechanisms, reducing processor operating latency, and thus improving processor efficiency. The electronic device provided in the embodiments of this application will be further described below through the examples shown in Figures 2 to 7.
[0055] Please refer to Figure 2, which is a schematic diagram of the hardware architecture of an electronic device 100 provided in an embodiment of this application. The electronic device 100 may be located within an electronic device. The electronic device may be a server device or located within a server, such as a server for performing cloud computing or a server for training artificial intelligence models (e.g., deep neural networks, decision trees, etc.). The electronic device can also be a terminal, including but not limited to: mobile phone, tablet computer, personal computer, handheld computer, mobile internet device (MID), camera, wearable device (e.g., smartwatch, smart bracelet, pedometer, etc.), audio equipment, audio and video player, set-top box, game console, printer, mouse, keyboard, in-vehicle equipment (e.g., equipment on vehicles such as cars, airplanes, ships, trains and high-speed trains), virtual reality (VR) device, augmented reality (AR) device, wireless terminal in industrial control, smart home device (e.g., refrigerator, television, air conditioner, electricity meter, etc.), smart robot, workshop equipment, wireless terminal in self-driving, wireless terminal in remote medical surgery, wireless terminal in smart grid, wireless terminal in transportation safety, wireless terminal in smart city, or wireless terminal in smart home, flying equipment (e.g., smart robot, hot air balloon, drone, airplane), etc. Figure 2 is merely an example of electronic device 100. Electronic device 100 can be any type of electronic component located within the electronic device or the electronic device itself. For example, electronic device 100 can be a chip, a chipset, or a circuit board with a chip or chipset mounted on it, etc. This embodiment is not limited in this respect. The aforementioned chip, chipset, or circuit board with a chip or chipset mounted on it can operate under suitable software drivers.
[0056] Electronic device 100 may include at least one processor. When electronic device 100 includes one processor, the processor may be a multi-core processor; when electronic device 100 includes multiple processors, at least one of the multiple processors may be a multi-core processor. The aforementioned multi-core processor may include, but is not limited to, a central processing unit (CPU), a graphics processing unit (GPU), an image signal processor (ISP), a digital signal processor (DSP), a network processing unit (NPU), an artificial intelligence (AI) processor, a modem, etc. The aforementioned multi-core processor may have multiple processor cores. Multiple processor cores located within the same processor, as well as multiple processor cores located on different processors, can interact with each other to collaboratively complete various tasks of electronic device 100. This application embodiment describes an electronic device 100 with one multi-core processor 10, and the multi-core processor 10 has four processor cores: processor core C1, processor core C2, processor core C3, and processor core C4, but this is not intended to limit the solution. Optionally, the aforementioned multiple processor cores can be integrated into one or more chips, which can be placed in a chipset. When multiple processor cores are integrated into the same chip, the chip is also called a system on a chip (SOC), as shown in Figure 2.
[0057] In this embodiment, the electronic device 100 may further include one or more other components, such as a memory 20, which may be located outside the aforementioned System-on-a-Chip (SOC). The memory 20 may, exemplarily, include components for storing instructions and data, such as volatile memory (DRAM) or dynamic random-access memory (DRAM). Volatile memory may include, but is not limited to, DRAM and Double Data Rate Synchronous Dynamic Random Access Memory (DDR). The memory 20 may also be referred to as the main memory of the electronic device 100. The memory 20 may store various operating system programs (e.g., general-purpose operating system programs and trusted operating system programs), application programs, instruction codes, and data required for operation. The multi-core processor 10 executes various functional applications and data processing of the electronic device 100 by loading programs and instructions and acquiring data. In addition to the aforementioned SOC, other components may also be included, such as, but not limited to, various multimedia components such as voice components, camera components, and display components, which can be coupled to the system bus 16 via a media bus.
[0058] Referring again to Figure 2, the multi-core processor 10 provided in this embodiment will be described in more detail below. In addition to the processor cores described above, the multi-core processor 10 may also include a system bus 11 and a shared cache 12, both of which can be integrated into the aforementioned System-on-a-Chip (SoC). The multiple processor cores and the shared cache 12 can all be connected to the system bus 11, thereby allowing the multiple processor cores to communicate with the shared cache 12 via the system bus 11. When the SoC includes multi-level caches (e.g., L3 or L4 caches), the shared cache 12 can also be the last level cache (LLC). The shared cache 12 may include, but is not limited to, at least one of the following: an instruction cache or a data cache. Each processor core in the multi-core processor 10 performs various functions through multiple workflows, such as reading instructions from off-chip memory, decoding instructions, and executing instruction content. In the aforementioned workflows, the processor 10 needs to interact with the memory during the instruction reading and instruction execution stages to read instructions or data from the memory. In one possible implementation, the multi-core processor 10 may further include a predictor 13, which may be located within the System-on-a-Chip (SoC). The predictor 13 may include, for example, at least one of the following: a branch predictor, a value predictor, a direction predictor, an address prefetcher, a data prefetcher, or an instruction predictor. The predictor 13 interacts with the shared cache 12 to request the shared cache 12 to prefetch at least one of instructions and data from memory. The multi-core processor 10 may also include an arbitration controller 14 and a shared execution unit 15. The arbitration controller 14 controls the occupancy time of the signal transmission paths for each processor core. The shared execution unit 15 may include various arithmetic execution components, such as, but not limited to, adders or multipliers. It should be noted that the shared cache 12, predictor 13, arbitration controller 14, and shared execution unit 15 mentioned above can be shared by multiple processor cores in the multi-core processor 10, or can be used by a single dedicated processor core. In addition, the shared cache 12, predictor 13, arbitration controller 14, and shared execution unit 15 also have priority selection functions. For example, the shared cache 12 can allocate part or all of the cache space to high-priority processor cores based on the instructions of the controller 16.
[0059] The multi-core processor 10 may also include a controller 16, which may be a hardware circuit, including but not limited to comparators, decoders, registers, and various logic gates; in another possible implementation, the controller 16 may also be a programmable logic device. The controller 16 may be connected to the system bus 11, thereby allowing the controller 16 to interact with the processor cores and shared cache 12 in the multi-core processor 10 via the system bus 11. For example, the controller 13 may receive indication signals from one or more processor cores in the multi-core processor 10, indicating the importance of threads running in the processor cores. Based on these indication signals, the controller 13 may dynamically allocate system shared resources in the multi-core processor 10 to one or more processor cores. The system shared resources described in this embodiment may include at least one of hardware resources and software resources. In one possible implementation, the hardware resources may include, for example, at least one of the following: shared cache 12, predictor 13, arbitration controller 14, or shared execution unit 15; the software resources may include, for example, the replacement priority of cache lines in the shared cache or the scheduling priority of processor cores.
[0060] Based on the multi-core processor 10 shown in Figure 2, during the operation of the multi-core processor 10, the total power consumption P of the multi-core processor 10 typically consists of the power consumption Pc1 of processor core C1, the power consumption Pc2 of processor core C2, the power consumption Pc3 of processor core C3, the power consumption Pc4 of processor core C4, and the power consumption Psc12 of the shared cache 12. Furthermore, ignoring the idle state overhead of the processor cores, the approximate operating power consumption Pc of each processor core is the product of the processor core's operating current I, the processor core's voltage V, and the processor core's task execution time T. Here, the operating current I and the task execution time T represent the total charge; the total charge consumed by the processor core to complete the same computational task remains constant. Therefore, with the total charge remaining constant, the lower the processor core's operating voltage V, the lower the processor core's power consumption. However, the performance of the processor core is related to the clock frequency f. In order to ensure that the processor core completes the task on time within the task execution time T, the clock frequency f of the processor core needs to reach the lower limit of the clock frequency. The clock frequency f is also related to the power supply voltage. The power supply voltage needs to reach a lower limit voltage to ensure that the clock frequency f meets the performance requirements for completing the task on time.
[0061] In existing technologies, taking DVFS technology as an example, to reduce the power consumption of the multi-core processor 10, it is common practice to independently power the four processor cores C1, C2, C3, and C4 based on their respective operating capabilities. Assume that the power supply voltage of processor core C1 is V1 and its clock frequency is f1; the power supply voltage of processor core C2 is V2 and its clock frequency is f2. Since voltages V1 and V2 are different, and clock frequencies f1 and f2 are also different, when processor cores C1 and C2 interact with each other, an asynchronous communication mechanism is needed in the multi-core processor 10 to support data interaction between them at different clock frequencies due to the difference in clock frequencies f1 and f2. Asynchronous data interaction typically increases the latency of the multi-core processor 10, reducing its efficiency. To mitigate the increased latency of the electronic device 100 caused by asynchronous data interaction as described above, the same power supply voltage and / or clock frequency can be set for each processor core in the multi-core processor 10. As can be seen from the above principles, when a certain processor core is under high load, it needs a higher supply voltage to ensure its performance. However, other processor cores may be under lower load. If a higher voltage is provided to the processor core with a lower load, it will result in wasted power consumption.
[0062] This application embodiment sets up resources that can be shared by multiple cores and prioritized, as well as a controller 16. This controller can dynamically allocate system shared resources to each processor core based on the importance of the threads running in each processor core. For example, for high-priority processor cores, more system shared resources can be allocated to them; for low-priority processor cores, fewer system shared resources can be allocated to them, or they may not be allowed to use system shared resources. Therefore, this application embodiment, through the dynamic allocation of system shared resources, allows high-priority processor cores to execute tasks at lower voltages, and low-priority processor cores can also execute tasks at lower voltages, thereby reducing the operating power consumption of the multi-core processor 10. Furthermore, since this application embodiment achieves power reduction by dynamically adjusting the system shared resources of the processor cores, compared to the prior art which achieves power reduction by providing different voltages and frequencies to different processor cores, this application embodiment can ensure that multiple processor cores can operate in real time under the same supply voltage and / or the same clock frequency, thus eliminating the need for asynchronous communication mechanisms, thereby reducing processor latency and improving processor efficiency. It should be noted that the processor core levels shown in the embodiments of this application can all be understood as the level of importance of the threads running in the processor core.
[0063] Based on the hardware architecture of the electronic device 100 shown in Figure 2, and in conjunction with the interaction relationships between the components shown in Figure 3, the resource allocation of the controller 16 to the processor cores will be described in more detail below. As shown in Figure 3, processor core C1 generates an indication signal V1 indicating the importance of the running thread based on the importance of the running thread, and sends the indication signal V1 to the controller 16; processor core C2 generates an indication signal V2 indicating the importance of the running thread based on the importance of the running thread, and sends the indication signal V2 to the controller 16; processor core C3 generates an indication signal V3 indicating the importance of the running thread based on the importance of the running thread, and sends the indication signal V3 to the controller 16; processor core C1 generates an indication signal V4 indicating the importance of the running thread based on the importance of the running thread, and sends the indication signal V4 to the controller 16. In one possible implementation, each of the above indication signals V1 to V4 can be a multi-bit indication signal, the value of which can be used to indicate the level of thread importance. This application does not specifically limit the number of bits in the indicator signals that indicate the importance of threads, but sets them according to the needs of the scenario. Taking the above-mentioned multi-bit 4-bit example, assuming that indicator signal V1 is "1000", indicator signal V2 is "0100", indicator signal V3 is "0010", and indicator signal V4 is "0001", then the value corresponding to indicator signal V1 is 8, the value corresponding to indicator signal V2 is 4, the value corresponding to indicator signal V3 is 2, and the value corresponding to indicator signal V4 is 1; it can be considered that processor core C1 has the highest level and processor core C3 has the lowest level. In one possible implementation, the above-mentioned indicator signals may be generated by the processor core based on at least one of performance event information and security context information. For example, if a processor core meets at least one of the following conditions: the processor core executes the same thread for a period of time (i.e., the context information remains unchanged for a period of time); the idle time of the same thread is less than a preset threshold during the aforementioned period; the same thread experiences resource bottlenecks (e.g., insufficient cache space); and the number of performance events exceeds a preset threshold, then the level corresponding to the thread running in the processor core can be determined as a high level. The more conditions met, the higher the level of importance, and the processor core can generate corresponding bit signals based on the met conditions. For example, when the above two conditions are met, the signal "0100" is generated; when the above four conditions are met, the signal "1111" is generated. Furthermore, in one possible implementation, the indication signal sent by the processor core can also indicate the work task executed by the processor core. For example, the signal indicating the importance of a thread can also include more bits, where some bits are used to indicate the value of the thread's importance, and other bits are used to indicate the type of work being performed.
[0064] The controller 16 can store a mapping relationship between thread importance values and resource allocation. Based on this mapping relationship, the controller 16 can allocate resources in various ways. In one possible implementation, the controller 16 can also sort the multiple processor cores according to the importance of the running threads and allocate resources to each processor core in descending order of importance, where the higher the importance value of a processor core, the more resources it receives. In another possible implementation, after receiving an indication signal from each processor core, the controller 16 can select one or more processor cores with the highest priority and allocate some or all resources to the selected processor cores. In a specific implementation, the controller 16 can query the above mapping relationship, generate resource configuration information, and then send the resource configuration information to the corresponding resources to enable resource configuration. The resource configuration information may include the processor core identifier and the resource allocation strategy. These resources include, but are not limited to, the shared cache 12, predictor 13, arbitration controller 14, and shared execution unit 15 shown in Figure 3. The aforementioned resource allocation information includes, but is not limited to, at least one of the following: storage space allocation information of shared cache 12, cache page replacement priority information in shared cache 12, scheduling priority information of each processor core, speculative strategy information of processor cores, usage permission information of predictor 13, occupancy time information of each processor core for signal transmission paths, or usage priority information of shared execution unit 15.
[0065] In this embodiment, the controller 16 can select high-priority processor cores in various ways. In one possible implementation, after receiving indication signals from each processor core, the controller 16 selects processor cores whose thread importance value is higher than a preset threshold as high-priority processor cores. In another possible implementation, the controller 16 can compare the values corresponding to all received processor cores to determine the differences between the levels of each processor core, and then select one or more candidate processor cores based on the differences. The controller 16 can also collect system information of the electronic device 100, which indicates the current critical tasks of the electronic device 100. The controller 16 can further select processor cores from the candidate processor cores whose tasks are critical, and then designate the finally selected processor cores as high-priority processor cores.
[0066] Assuming the controller 16 compares the indication signals corresponding to processor cores C1 to C4 and determines that processor core C1 has the highest priority, and that the priority of processor core C1 is much higher than that of other processor cores, then the controller 16 can allocate all four resources—shared cache 12, predictor 13, arbitration controller 14, and shared execution unit 15—to processor core C1. That is, the following configuration information can be generated: Configuration information 1: The controller 16 can allocate 3 / 4 of the cache space in shared cache 12 to processor core C1, with the remaining 1 / 4 cache space shared by the other processor cores, and increase the retention priority of processor core C1; Configuration information 2: The predictor 13 is exclusively used by processor core C1; Configuration information 3: The arbitration controller 14 is instructed to allocate 70% of the duty cycle in the signal transmission path to processor core C1, and the remaining 30% of the duty cycle in the signal transmission path is allocated to other processor cores; Configuration information 4: At least a portion of the arithmetic units in the shared execution unit 15 are preferentially allocated to processor core C1. Then, the processor core can send configuration information 1 to the shared cache 12, configuration information 2 to the predictor 13, configuration information 3 to the arbitration controller 14, and configuration information 4 to the shared execution unit 15. Thus, the shared cache 12, predictor 13, arbitration controller 14, and shared execution unit 15 can configure the processor core's resources based on the received configuration information. Configuration information 1 allows processor core C1 to have more reserved cache resources and makes it easier for instructions or data corresponding to processor core C1 to reside in the cache; configuration information 2 and configuration information 4 allow processor core C1 to achieve further acceleration; and configuration information 3 allows processor core C1 to have lower memory access latency.
[0067] The following description uses the example of dynamically allocating system cache 12 by controller 16, through the application scenarios shown in Figures 4A to 4C, to illustrate the specific examples. The mapping relationship between the allocation ratios of shared cache 12 is established. After receiving indication signals from each processor core, controller 16 can query this mapping relationship and allocate cache space in shared cache 12 to each processor core based on its level. In another possible implementation, controller 16 can compare the values corresponding to all processor cores received to determine the differences between them, and then allocate cache space to each processor core based on these differences. For example, half of the cache space in shared cache 12 can be allocated to processor core C1 for sole use, one-third of the cache space in shared cache 12 can be allocated to processor core C2 for sole use, and the remaining one-sixth of the cache space in shared cache 12 can be allocated to processor cores C3 and C4 for shared use.
[0068] In one possible implementation of this application embodiment, the controller 16 allocates the cache space of the shared cache to each processor core. For example, it may generate cache space configuration information corresponding to each processor core and write the cache space configuration information into the shared cache 12 (e.g., write it into the configuration register in the shared cache 12). The cache space configuration information may indicate the location area of the cache space allocated to the processor core in the shared cache 12. Thus, the shared cache 12 may cache at least one of the instructions and data of the processor core into the corresponding cache space based on the cache area corresponding to each processor core.
[0069] It should be noted that, as illustrated in Figure 4A, the interaction schematically shows the case where processor cores C1 through C4 all send indication signals to controller 16. In other possible implementations of this application, only processor cores whose thread importance exceeds a preset threshold may send indication signals to controller 16. For example, if the levels corresponding to processor cores C1 and C2 exceed the preset threshold, then only processor cores C1 and C2 send indication signals to controller 16, while processor cores C3 and C4 do not. Therefore, controller 16 can allocate cache space in shared cache 12 to processor cores C1 through C4 based on the indication signals sent by processor cores C1 and C2, or controller 16 can allocate cache space in shared cache 12 only to processor cores C1 and C2.
[0070] As shown in Figure 4A, processor core C1 corresponds to cache regions A1 to A9 in shared cache 12, processor core C2 corresponds to cache regions B1 to B6 in shared cache 12, and processor cores C3 and C4 both correspond to cache regions D1 to D3 in shared cache 12, as shown in Figure 4B. Based on the scenario shown in 4A, after the electronic device 100 runs a preset clock cycle, processor cores C1 to C4 in the electronic device 100 can continue to send indication signals indicating the importance of threads to the controller 16. Assuming that the indication signal V5 sent by processor core C4 corresponds to a value of 15, and the indication signals sent by the other processor cores all correspond to a value of 1, the controller 16 can reclaim part of the cache space originally belonging to processor cores C1 to C3 for use by processor core C4. For example, processor core C1 exclusively uses cache area A1 in shared cache 12, processor cores C2 and C3 share cache areas B1 to B2 in shared cache 12, and processor core C4 exclusively uses cache areas A2 to A9, cache areas B3 to B6, and cache areas D1 to D3 in shared cache 12, as shown in Figure 4C.
[0071] As can be seen from Figures 4A to 4C, in this embodiment of the application, by setting up a controller 16, the controller 16 can dynamically allocate storage space in the shared cache 12 to each processor core based on the indication signal sent by the processor core indicating the importance of the thread. The higher the level of the processor core, the more cache space it occupies, thereby caching more instructions and data, which in turn allows the higher level processor core to work at a lower voltage, thereby reducing the power consumption of the electronic device 100.
[0072] Based on the architecture of the electronic device 100 shown in Figure 2, the interaction relationship shown in Figure 3, and the application scenarios shown in Figures 4B to 4C, in one possible implementation of this application embodiment, the processor core may be equipped with a performance monitoring unit and an identification unit. The indication signal indicating the importance of the thread may be determined by the collaborative work of the performance monitoring unit and the identification unit in the processor core. The following describes the structure of the processor core in more detail, taking processor core C1 as an example, in conjunction with Figure 5. As shown in Figure 5, processor core C1 may include an arithmetic and logic unit (ALU) 101, a load store unit (LSU) 102, a performance monitoring unit (PMU) 103, an identification unit 104, and a register group 105. It is understood that the processor core may include more or fewer components or modules, and this application does not specifically limit this. For example, the processor core may also include components such as a program counter and an instruction decoder. The aforementioned ALU 101, LSU 102, performance monitoring unit 103, and identification unit 104 may be configured as four independent modules.
[0073] Register group 105 includes, but is not limited to, data registers and instruction registers. It is understood that register group 105 may include more registers, the number of which is set based on the needs of the application scenario, and this embodiment does not impose a specific limitation. Register group 105 communicates with arithmetic logic unit 101 and LSU 102 respectively. Arithmetic logic unit 101 performs various logical operations by reading data from registers in register group 105, and stores the results of logical operations in register group 105.
[0074] LSU102 is a dedicated unit in processor 10 for executing load and store instructions. LSU102 generates addresses for load or store operations and sends these addresses to cache controller 12. Cache controller 12 then loads data from memory 20 or cache 12 into register set 105, or stores data from register set 105 into memory 20. Loading data from memory 20 or cache 12 refers to reading data from memory 20 or cache 12. It should be noted that the data loaded by LSU102 is all data required by processor 10 to execute instructions.
[0075] The performance monitoring unit 103 can communicate with the aforementioned register group 105, ALU 101, LSU 102, and identification unit 104. The performance monitoring unit 103 may include components such as a performance counter and a performance register. The performance monitoring unit 103 receives various events from the register group 105, ALU 101, and LSU 102. The event types may include, but are not limited to: software increment, instruction prefetching leading to instruction or cache refill, exception occurrence, data memory access, and data cache reclamation. Each time the performance monitoring unit 103 receives an event, it provides the corresponding performance monitoring event information to the identification unit 104. This performance monitoring event information may include, for example, the event type. In one possible implementation, the performance monitoring unit 103 increments the performance counter by one each time it receives an event; that is, the value of the performance counter is the number of performance monitoring events received by the performance monitoring unit 103. The performance monitoring unit 103 can also periodically provide performance monitoring event information to the identification unit 104. When the performance monitoring unit 103 periodically provides performance monitoring event information to the identification unit 104, the performance monitoring event information may also include the value of the performance counter. For example, the performance monitoring unit 103 may periodically provide both the value of the performance counter and the event type to the identification unit 104. Furthermore, in one possible implementation, the performance monitoring unit 103 may also record the security context information of the thread running the ALU 101. The performance monitoring unit 103 may also provide the security context information to the identification unit 104.
[0076] The identification unit 104 can be a standalone hardware circuit, such as, but not limited to, a comparator, a decoder, and various logic gates. Furthermore, in other possible implementations, the identification unit 104 can also be a programmable logic device. The identification unit 104 generates an indication signal indicating the importance of a thread based on at least one of the performance event information and security context information received from the performance monitoring unit 103.
[0077] In one possible implementation, the identification unit 104 may be equipped with a counter, which can be used to record the number of performance events within a preset period. Thus, the identification unit 104 can generate an indication signal based on the performance event information. Specifically, this can include several implementation methods. In a first possible implementation, the identification unit 104 increments the counter by one each time it receives performance monitoring event information. A second possible implementation, as shown in Figure 6A, may have a pre-set mapping relationship between monitoring event types and levels. The identification unit 104 can also utilize this mapping relationship; after receiving performance monitoring event information from the performance monitoring unit 103, it can query the mapping relationship to determine the level corresponding to the performance monitoring event information. When the level is high (for example, performance monitoring event types are divided into five levels, with levels three, four, and five all belonging to high levels), the counter can be incremented by one. If the counter value exceeds a preset threshold within a preset period, it indicates a high number of high-level performance monitoring events. This means that the processor core (or the thread running on the processor core) enters an idle state less frequently during that period, or the idle time percentage is less than the preset threshold. It also indicates that the processor core (or the thread running on the processor core) currently has insufficient computing power and cannot complete the task within a given time. Conversely, if the counter value is below the preset threshold within a preset period, it indicates fewer high-level performance monitoring events. This means that the processor core (or the thread running on the processor core) is idle for a long time during that period, or the idle time percentage is higher than the preset threshold, indicating sufficient computing power. Therefore, the identification unit 104 can generate an indication signal based on the counter value within the preset period.
[0078] In one possible implementation, the identification unit 104 may also include a timer to record the duration of the aforementioned security context information. Thus, the identification unit 104 can generate an indication signal based on the security context information, as shown in Figure 6B. As shown in Figure 6B, after receiving the security context information from the performance monitoring unit 103, the identification unit 104 can determine whether the security context information has changed. If the security context information has not changed, it indicates that the task executed by the processor core (or the thread running the processor core) has not changed, and the timer can continue to count to record the duration of the corresponding security context information. If the security context information has changed, it indicates that the task executed by the processor core (or the thread running the processor core) has changed, and the timer can be interrupted. If the duration of the timer reaches a preset threshold, it indicates that the processor core (or the thread running the processor core) has been continuously executing the same task within the timer's duration. To ensure that the tasks executed by the processor core can be continuously executed by the processor core, thereby maximizing the utilization of the history table and cache, when the duration of the security context information exceeds a preset threshold, the processor core (or the thread running the processor core) can be considered to have a high level. Similarly, when the duration of the security context information is less than the preset threshold, it indicates that the processor core (or the thread running the processor core) has executed multiple tasks within the timer's duration, and the processor core (or the thread running the processor core) can be considered to have a low level. Therefore, the identification unit 104 can also generate an indication signal based on the timer's duration within a preset period.
[0079] In one possible implementation, the identification unit 104 can also combine the counter value within a preset period and the timer duration within the preset period to generate an indication signal. Thus, the identification unit 104 can generate an indication signal based on both performance event information and security context information, as shown in Figure 6C. For example, the identification unit 104 can pre-set a mapping relationship between the counter value within a preset period, the timer duration, and the thread importance. After receiving performance event information and security context information from the performance monitoring unit 103, the identification unit 104 can perform counting and timing operations using the processing methods described above for performance event information and for security context information, respectively; then, it queries the mapping relationship to determine the bit value corresponding to the counter count value and timer duration within the preset period that indicates the thread importance, and uses this bit value as the indication signal.
[0080] Based on the same inventive concept, this application also provides a method for allocating system cache resources, which is applied to the multi-core processor 10 shown in FIG2. Please continue to refer to FIG7, which shows a flow 700 of the method for allocating system cache resources provided in this application embodiment. The flow 700 of the method for allocating system cache resources can be executed by the multi-core processor 10, including the following steps: Step 701, generating a plurality of indication signals, wherein each indication signal is used to indicate the importance of the thread running in the corresponding processor core; Step 702, selectively allocating system shared resources to multiple processor cores based on the plurality of indication signals.
[0081] In one possible implementation, the system shared resources include at least one of the following: a shared cache, a predictor, an arbitration controller, or a shared execution unit.
[0082] In one possible implementation, generating multiple indication signals includes generating multiple indication signals based on at least one of performance monitoring events and security context information of multiple processor cores.
[0083] In one possible implementation, multiple indication signals are generated based on at least one of performance monitoring events and security context information of multiple processor cores, including: generating multiple indication signals based on the number of counts of counters set in each of the multiple processor cores within a preset period; wherein the counters are used to count the performance monitoring events corresponding to each processor core.
[0084] In one possible implementation, an indication signal is generated based on at least one of the performance monitoring events of the processor core and security context information, including: generating multiple indication signals based on the duration of timers set in multiple processor cores within a preset period; wherein the timers are used to record the duration of the security context corresponding to each processor core.
[0085] In one possible implementation, based on multiple indication signals, system shared resources are selectively allocated to multiple processor cores, including: allocating system shared resources to multiple processor cores in descending order of the importance of the running threads; wherein the amount of system cache resources allocated to processor cores with higher importance of the running threads is greater than the amount of system cache resources allocated to processor cores with lower importance of the running threads.
[0086] In one possible implementation, the system shared resources are selectively allocated to multiple processor cores based on multiple indication signals, including: comparing the indication signals of the multiple processor cores, selecting at least one processor core with the highest importance of the running thread based on the comparison results, and allocating some or all of the system shared resources to at least one processor core.
[0087] In one possible implementation, allocating some or all of the system's shared resources to at least one processor core includes: generating resource configuration information for multiple processor cores based on multiple indication signals, and providing the resource configuration information to the system's shared resources.
[0088] In one possible implementation, the resource configuration information includes at least one of the following: storage space allocation information of the shared cache, cache page replacement priority information in the shared cache, scheduling priority information of the processor core, speculative strategy information of the processor core, usage permission information of the predictor, occupancy time information of the processor core for signal transmission paths, or usage priority information of the shared execution unit.
[0089] In one possible implementation, multiple processor cores have at least one of the same power domain and clock domain.
[0090] It is understood that, in order to achieve the above-mentioned functions, a multi-core processor includes hardware and / or software modules that perform the respective functions. Based on the steps of the examples described in conjunction with the embodiments disclosed herein, this application can be implemented in hardware or a combination of hardware and computer software. Whether a function is implemented in hardware or by computer software driving hardware depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application in conjunction with the embodiments, but such implementation should not be considered beyond the scope of this application.
[0091] This embodiment can divide a multi-core processor into functional modules according to the above method example. For example, different functional modules can be divided for each function, or two or more functions can be integrated into one processing module. The integrated module can be implemented in hardware. It should be noted that the module division in this embodiment is illustrative and only represents one logical functional division. In actual implementation, there may be other division methods.
[0092] With each functional module corresponding to its respective function, Figure 8 shows a possible schematic diagram of the device 800 for allocating system cache resources involved in the above embodiments. The previously mentioned device can be further extended. For example, the device 800 for allocating system cache resources corresponding to Figure 8 can be a software device running on the multi-core processor 10, or the device 800 can be a combination of software and hardware embedded in the multi-core processor 10. As shown in Figure 8, the device 800 for allocating system cache resources may include: a generation module 801, used to generate multiple indication signals, wherein each indication signal is used to indicate the importance of a thread running in a corresponding processor core; and an allocation module 802, used to selectively allocate shared system resources to multiple processor cores based on the multiple indication signals.
[0093] In one possible implementation, the system shared resources include at least one of the following: a shared cache, a predictor, an arbitration controller, or a shared execution unit.
[0094] In one possible implementation, the generation module 801 is further configured to generate multiple indication signals based on at least one of performance monitoring events and security context information of multiple processor cores.
[0095] In one possible implementation, the generation module 801 is further configured to generate multiple indication signals based on the number of times counters set in multiple processor cores within a preset period; wherein the counters are used to count the performance monitoring events corresponding to each processor core.
[0096] In one possible implementation, the generation module 801 is further configured to generate multiple indication signals based on the timing duration of timers set in multiple processor cores within a preset period; wherein the timers are used to record the duration of the security context corresponding to each processor core.
[0097] In one possible implementation, the allocation module 802 is further configured to allocate system shared resources to multiple processor cores in descending order of the importance of the running threads; wherein the amount of system cache resources allocated to the processor core with higher importance of the running threads is higher than the amount of system cache resources allocated to the processor core with lower importance of the running threads.
[0098] In one possible implementation, the allocation module 802 is further configured to: compare indication signals of multiple processor cores, select at least one processor core with the highest importance of the running thread based on the comparison result, and allocate some or all of the system shared resources to at least one processor core.
[0099] In one possible implementation, the allocation module 802 is further configured to: generate resource configuration information for multiple processor cores based on multiple indication signals, and provide the resource configuration information to system shared resources.
[0100] In one possible implementation, the resource configuration information includes at least one of the following: storage space allocation information of the shared cache, cache page replacement priority information in the shared cache, scheduling priority information of the processor core, speculative strategy information of the processor core, usage permission information of the predictor, occupancy time information of the processor core for signal transmission paths, or usage priority information of the shared execution unit.
[0101] The apparatus 800 for allocating system cache resources provided in this embodiment is used to execute the method for allocating system cache resources executed by the multi-core processor 10, and can achieve the same effect as the above-described implementation method or apparatus. Specifically, each module corresponding to FIG8 above can be implemented by software, hardware, or a combination of both. For example, each module can be implemented in software form to drive the multi-core processor 10 to work. Alternatively, each module can include a corresponding processor and corresponding driver software, that is, implemented by a combination of software or hardware.
[0102] Exemplarily, the multi-core processor 10 may further include at least one processor and a memory. The at least one processor can invoke all or part of the computer program stored in the memory to control and manage the operation of the multi-core processor 10, for example, it can be used to support the multi-core processor 10 in executing the steps performed by the various modules described above. The memory can be used to support the multi-core processor 10 in executing stored program code and data, and the memory includes, but is not limited to, cache, registers, or at least a portion of the storage space of the memory 20 described above. The multi-core processor 10 can implement or execute various exemplary multiple logic modules described in conjunction with the disclosure of this application, which may be a combination of one or more microprocessors implementing computing functions. Furthermore, the multi-core processor 10 may also include other programmable logic devices, transistor logic devices, or discrete hardware components.
[0103] The multi-core processor described in this application can be implemented on integrated circuits (ICs), analog ICs, radio frequency integrated circuits, mixed-signal ICs, application-specific integrated circuits (ASICs), printed circuit boards (PCBs), electronic devices, etc. This multi-core processor can also be manufactured using various IC process technologies, such as complementary metal oxide semiconductors (CMOS), n-type metal-oxide-semiconductor (NMOS), p-type metal oxide semiconductors (PMOS), bipolar junction transistors (BJTs), bipolar CMOS (BiCMOS), silicon germanium (SiGe), gallium arsenide (GaAs), etc.
[0104] This application also provides a readable storage medium for storing instructions stored in the readable storage medium. When a multi-core processor runs the instructions, the multi-core processor executes the above-described related method steps to implement the method for allocating system cache resources in the above embodiments.
[0105] This application also provides a computer program product, which includes a computer program; when the computer program is executed by a multi-core processor, it causes the multi-core processor to perform the above-mentioned related steps to implement the method for allocating system cache resources in the above embodiments.
[0106] It should be understood that in the various embodiments of this application, the order of the above-mentioned processes does not imply the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of this application.
[0107] Those skilled in the art will recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this application.
[0108] Those skilled in the art will understand that, for the sake of convenience and brevity, the specific working processes of the systems, devices, and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here.
[0109] In the several embodiments provided in this application, it should be understood that the disclosed systems, apparatuses, and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of the units described above is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed between the units may be through some interfaces; the indirect coupling or communication connection between the apparatuses or units may be electrical, mechanical, or other forms.
[0110] The units described above as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.
[0111] In addition, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit.
[0112] If the aforementioned functions are implemented as software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or a portion of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this application. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0113] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of this application, and are not intended to limit them. Although this application has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some or all of the technical features therein. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the scope of the technical solutions of the embodiments of this application.
Claims
1. A multi-core processor, characterized in that, include: Multiple processor cores, controllers, and system resources are shared. The plurality of processor cores are configured to generate indication signals respectively and transmit the indication signals to the controller, wherein the indication signals are used to indicate the importance of the threads running in each processor core; The controller is configured to selectively allocate the system shared resources to the plurality of processor cores based on indication signals from the plurality of processor cores.
2. The multi-core processor according to claim 1, characterized in that, The system's shared resources include at least one of the following: a shared cache, a predictor, an arbitration controller, or a shared execution unit.
3. The multi-core processor according to claim 1 or 2, characterized in that, Each of the plurality of processor cores is specifically used to: generate the indication signal based on at least one of the processor core's performance monitoring events and security context information.
4. The multi-core processor according to claim 3, characterized in that, Each processor core includes a performance monitoring unit and an identification unit; among which, The performance monitoring unit is configured to generate at least one of the performance monitoring event and the security context information. The identification unit is used to generate the indication signal based on at least one of the performance monitoring event and the security context information.
5. The multi-core processor according to claim 4, characterized in that, The identification unit includes a counter, which is used to count the performance monitoring events; The identification unit is specifically used to generate the indication signal based on the number of times the counter counts within a preset period.
6. The multi-core processor according to claim 4 or 5, characterized in that, The identification unit further includes a timer, which is used to record the duration of the security context; The identification unit is specifically used to generate the indication signal based on the timing duration of the timer within a preset period.
7. The multi-core processor according to any one of claims 1 to 6, characterized in that, The controller is specifically used for: The system shared resources are allocated to the multiple processor cores in descending order of the importance of the running threads; wherein the amount of system cache resources allocated to the processor cores whose running threads are of higher importance is greater than the amount of system cache resources allocated to the processor cores whose running threads are of lower importance.
8. The multi-core processor according to any one of claims 1 to 6, characterized in that, The controller is specifically used for: The indication signals of the plurality of processor cores are compared, and based on the comparison results, at least one processor core with the highest importance of the running thread is selected, and some or all of the system's shared resources are allocated to the at least one processor core.
9. The multi-core processor according to claim 7 or 8, characterized in that, The controller is further specifically configured to: generate resource configuration information for the plurality of processor cores based on the indication signals of the plurality of processor cores, and provide the resource configuration information to the system shared resources; The system shares resources and is used to configure resources for the processor core based on the resource configuration information.
10. The multi-core processor according to claim 9, wherein the resource configuration information includes at least one of the following: storage space allocation information of the shared cache, cache page replacement priority information in the shared cache, scheduling priority information of the processor core, speculative strategy information of the processor core, usage permission information of the predictor, occupancy time information of the signal transmission path of the processor core, or usage priority information of the shared execution unit.
11. The multi-core processor according to any one of claims 1 to 10, characterized in that, The plurality of processor cores have at least one of the same power domain and clock domain.
12. An electronic device, characterized in that, The electronic device includes a power management unit and a multi-core processor as described in any one of claims 1 to 11; wherein the power management unit supplies power to each processor core of the multi-core processor.
13. A method for allocating system cache resources, applied to a multi-core processor, the multi-core processor comprising multiple processor cores, characterized in that, include: Multiple indicator signals are generated, each of which indicates the importance of a thread running in the corresponding processor core; Based on the multiple indication signals, the system's shared resources are selectively allocated to the multiple processor cores.
14. The method according to claim 13, characterized in that, The system's shared resources include at least one of the following: a shared cache, a predictor, an arbitration controller, or a shared execution unit.
15. The method according to claim 13 or 14, characterized in that, The generation of multiple indication signals includes: The plurality of indication signals are generated based on at least one of the performance monitoring events and security context information of the plurality of processor cores.
16. The method according to claim 15, characterized in that, The generation of the plurality of indication signals based on at least one of the performance monitoring events and security context information of the plurality of processor cores includes: The plurality of indication signals are generated based on the number of counts of the counters set in the plurality of processor cores within a preset period; wherein, the counters are used to count the performance monitoring events corresponding to each processor core.
17. The method according to claim 15 or 16, characterized in that, The indication signal is generated by generating at least one of the processor core-based performance monitoring events and security context information, including: The plurality of indication signals are generated based on the duration of the timers set in the plurality of processor cores within a preset period; wherein, the timers are used to record the duration of the security context corresponding to each processor core.
18. The method according to any one of claims 13 to 17, characterized in that, The step of selectively allocating the system's shared resources to the multiple processor cores based on the multiple indication signals includes: The system shared resources are allocated to the multiple processor cores in descending order of the importance of the running threads; wherein the amount of system cache resources allocated to the processor cores whose running threads are of higher importance is greater than the amount of system cache resources allocated to the processor cores whose running threads are of lower importance.
19. The method according to any one of claims 13 to 17, characterized in that, The step of selectively allocating the system's shared resources to the multiple processor cores based on the multiple indication signals includes: The indication signals of the plurality of processor cores are compared, and based on the comparison results, at least one processor core with the highest importance of the running thread is selected, and some or all of the system's shared resources are allocated to the at least one processor core.
20. The method according to claim 18 or 19, characterized in that, The step of allocating some or all of the system's shared resources to the at least one processor core includes: Based on the multiple indication signals, resource configuration information for the multiple processor cores is generated, and the resource configuration information is provided to the system shared resources.
21. The method according to claim 20, wherein the resource configuration information includes at least one of the following: storage space allocation information of the shared cache, cache page replacement priority information in the shared cache, scheduling priority information of the processor core, speculative strategy information of the processor core, usage permission information of the predictor, occupancy time information of the processor core for signal transmission paths, or usage priority information of the shared execution unit.
22. The method according to any one of claims 13 to 21, characterized in that, The plurality of processor cores have at least one of the same power domain and clock domain.
23. An apparatus for allocating system cache resources, characterized in that, The device includes a processor and a memory, the memory storing instructions that, when executed by the processor, cause the device to perform the method as described in any one of claims 13 to 22.
24. A readable storage medium, characterized in that, The readable storage medium stores instructions that, when executed by a multi-core processor, cause the multi-core processor to perform the method as described in any one of claims 13 to 22.
25. A computer program product, characterized in that, The computer program product includes a computer program that, when executed by a multi-core processor, causes the multi-core processor to perform the method as described in any one of claims 13 to 22.