A method and related apparatus for hyper-threading scheduling
By determining the group of hyperthreads based on their hyperthread identifiers and scheduling them on different physical cores, the problem of inaccurate hyperthread scheduling is solved, and the efficiency and resource utilization of hyperthread scheduling are improved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- HUAWEI TECH CO LTD
- Filing Date
- 2024-12-30
- Publication Date
- 2026-06-30
AI Technical Summary
Existing hyper-threading scheduling methods suffer from insufficient resource utilization, low scheduling efficiency, and inability to achieve real-time scheduling due to the inability to accurately group threads.
By obtaining the identifier of a hyperthread, the hyperthread group to which it belongs is determined, and hyperthreads in the same group are scheduled to run on the same physical core. By leveraging the load balancing and execution efficiency differences between different physical cores, the scheduling accuracy and efficiency are improved.
It improves the accuracy and efficiency of hyper-threading scheduling, reduces the overhead of real-time grouped computing, and improves processor resource utilization and performance.
Smart Images

Figure CN122309049A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of computer technology, and in particular to a hyper-threading scheduling method and related equipment. Background Technology
[0002] With the popularization and development of internet technology, the computer field continues to pursue higher computing efficiency and performance. In traditional single-core, single-threaded processors, when a thread pauses execution, the processor's resources are idle, resulting in resource waste.
[0003] To efficiently utilize computing resources, Hyper-Threading (SMT) technology emerged. In SMT, the processor's physical core is divided into multiple logical cores, each capable of executing an independent thread, also known as a hyper-thread. When the hyper-thread of one logical core pauses execution, the processor can switch to and execute the hyper-thread of another logical core. In other words, SMT technology allows the same physical core to process instructions from multiple hyper-threads simultaneously, thereby improving processor utilization.
[0004] In related technologies, when scheduling multiple hyperthreads, it is necessary to determine the grouping of each hyperthread among multiple hyperthreads based on the shared resource type of each hyperthread, and then perform scheduling based on the grouping results.
[0005] However, since there are many types of resources shared by hyper-threads, the above grouping based on the type of shared resources leads to inaccurate grouping results and low precision; at the same time, it cannot be scheduled in real time, resulting in low efficiency of hyper-thread scheduling. Summary of the Invention
[0006] This application provides a hyper-threading scheduling method and related device, which can ensure the accuracy of hyper-threading grouping, reduce real-time hyper-threading scheduling overhead, and thus improve the efficiency of hyper-threading scheduling.
[0007] Firstly, a hyper-threading scheduling method is provided. This method is applied to a computing device that stores multiple hyper-threading groups, including a first hyper-threading group and a second hyper-threading group. The performance of a first hyper-thread indicated by the first hyper-threading group running on the same physical core is greater than the performance of a hyper-thread indicated by both the first and second hyper-threading groups running on the same physical core. The method includes: obtaining multiple hyper-threads to be scheduled and their identifiers; determining, based on the identifiers, that M hyper-threads belong to the first hyper-threading group and N hyper-threads belong to the second hyper-threading group; where M is a positive integer greater than 1 and N is a positive integer greater than 1; scheduling each of the M hyper-threads to run on at least one first physical core and scheduling each of the N hyper-threads to run on at least one second physical core; the first and second physical cores are different physical cores.
[0008] As shown above, the computing device stores multiple hyperthreading groups, each containing multiple hyperthreads. During hyperthread scheduling, the device can schedule hyperthreads belonging to the same hyperthreading group to the same physical core, thus achieving hyperthread scheduling. Since the performance of the first hyperthread indicated by the first hyperthreading group running on the same physical core is greater than that of the first hyperthread indicated by the second hyperthreading group running on the same physical core, scheduling hyperthreads belonging to the same group to the same physical core ensures optimal hyperthread performance. Furthermore, since the computing device stores multiple hyperthreading groups, it does not need to perform real-time grouping computation on hyperthreads before scheduling; this improves the efficiency of hyperthread scheduling.
[0009] In one possible implementation, the first physical core includes a first logical core and a second logical core. Scheduling each of the M hyperthreads to run on at least one first physical core includes: scheduling the second hyperthread of the M hyperthreads to the first logical core and scheduling the third hyperthread of the M hyperthreads to the second logical core; obtaining the utilization rate of the first logical core and the utilization rate of the second logical core; if the utilization rate of the first logical core is higher than the utilization rate of the second logical core, scheduling the fourth hyperthread of the M hyperthreads to the second logical core.
[0010] As can be seen from the above, since the performance of the first hyperthread indicated by the first hyperthread group and the hyperthread indicated by the first hyperthread group running on the same physical core is greater than the performance of the first hyperthread and the hyperthread indicated by the second hyperthread group running on the same physical core, scheduling the fourth hyperthread among the M hyperthreads to the second logical core will not affect the performance of the first physical core; at the same time, it can also balance the utilization rate of logical cores on the first physical core and achieve load balancing of logical cores.
[0011] In one possible implementation, the computing device includes a first physical core group and a second physical core group, the first physical core group including a first physical core; the second physical core group including a second physical core; the method further includes: obtaining the average occupancy rate of the first physical core group and the average occupancy rate of the first physical core group; the average occupancy rate is used to indicate the ratio of the sum of the occupancy rates of each physical core in the physical core group to the number of physical cores; if the average occupancy rate of the first physical core group is higher than the average occupancy rate of the second physical core group, the idle second physical cores in the second physical core group are allocated to the first physical core group.
[0012] As shown above, by modifying the physical core group to which the physical core belongs by adjusting the average utilization rate of the physical core group, load balancing between physical core groups is achieved, which further improves the performance of hyper-threading.
[0013] In one possible implementation, the execution efficiency of each test group in multiple test groups is obtained; the execution efficiency is used to indicate the ratio of a first quantity to a second quantity; the first quantity is the sum of the parallel cycle instruction count (IPC) of each of the multiple super-threads under test in the test group when they run simultaneously on a third physical core; the second quantity is the sum of the cycle instruction count (IPC) of each of the multiple super-threads under test in the test group when they run individually on a third physical core; the multiple test groups include multiple super-threads under test, and the identifiers of the multiple super-threads under test are the same as the identifiers of the multiple super-threads; a clustering operation is performed on the execution efficiency of each test group to determine multiple super-thread groups; wherein each super-thread group in the multiple super-thread groups includes at least one test group.
[0014] As shown above, by clustering the execution efficiency of multiple test groups, hyperthreads can be grouped. This ensures that the performance of the first hyperthread indicated by the first hyperthread group running on the same physical core is greater than the performance of the hyperthreads indicated by the first and second hyperthread groups running on the same physical core. Furthermore, by running tests on multiple test groups and obtaining the execution efficiency of each test group, the accuracy of hyperthread grouping can be ensured, thus improving the precision of hyperthread grouping.
[0015] In one possible implementation, when multiple hyperthreads under test in the target test group are running simultaneously on the third physical core, the parallel cycle instruction count (IPC) of each of the multiple hyperthreads under test in the target test group is obtained to determine a first target number; when multiple hyperthreads under test in the target test group are running individually on the third physical core, the cycle instruction count (IPC) of each of the multiple hyperthreads under test in the target test group is obtained to determine a second target number; based on the first number and the second number, the execution efficiency of the target test group is determined as the ratio of the first target number to the second target number; wherein, the target test group is any one of multiple test groups.
[0016] As can be seen from the above, determining the execution efficiency of each target test group through testing can ensure the accuracy of the execution efficiency of the target test group and further improve the accuracy of hyper-threading grouping.
[0017] In one possible implementation, the third physical core is a physical core of the same type as the first and second physical cores.
[0018] In one possible implementation, a physical core group is determined based on the number of hyperthreads included in each of the multiple hyperthreading groups; wherein the number of physical cores used to execute the physical core group with a large number of hyperthreads in the hyperthreading group is greater than the number of physical cores used to execute the physical core group with a small number of hyperthreads in the hyperthreading group.
[0019] In one possible implementation, the number of first physical cores in the first physical core group is the same as the number of second physical cores in the second physical core group. The first physical core group is used to run the first hyperthread in the first hyperthreading group, and the second physical core group is used to run the second hyperthread in the second hyperthreading group.
[0020] Secondly, a hyper-threading scheduling device is provided. Embodiments of this application can divide the thread scheduling device into functional modules based on the hyper-threading scheduling method provided in the first aspect. For example, each function can be divided into its own functional modules, or two or more functions can be inherited into a single processing module. For instance, embodiments of this application can divide the hyper-threading scheduling device into a data acquisition module, a judgment module, and a scheduling module according to their functions. The descriptions of the possible technical solutions and beneficial effects of the various functional modules described above can be found in the technical solutions provided in the first aspect or its corresponding possible implementations, and will not be repeated here.
[0021] Thirdly, embodiments of this application provide a computing device including a processor and a memory for storing processor-executable instructions; the processor is configured to execute instructions such that the computing device performs the hyper-threading scheduling method described in the first aspect.
[0022] Fourthly, embodiments of this application provide a computer-readable storage medium storing at least one computer program, which is loaded and executed by a processor to implement the hyper-threading scheduling method as described in the first aspect above.
[0023] Fifthly, embodiments of this application provide a computer program product or computer program that includes computer instructions stored in a computer-readable storage medium. A processor of a computing device reads the computer instructions from the computer-readable storage medium and executes the computer instructions, causing the computing device to perform the hyper-threading scheduling method provided in the various optional implementations of the first aspect described above.
[0024] For a detailed description of the second to fifth aspects and their various implementations in the embodiments of this application, please refer to the detailed description in the first aspect and its various implementations; and for a detailed description of the beneficial effects of the second to fifth aspects and their various implementations, please refer to the beneficial effect analysis in the various implementations of the first aspect, which will not be repeated here.
[0025] These or other aspects of the embodiments of this application will become more apparent in the following description. Attached Figure Description
[0026] Figure 1 A flowchart illustrating a hyper-threading scheduling method provided by related technologies is shown;
[0027] Figure 2 This illustration shows an operational schematic of a computing device 100 supporting Hyper-Threading technology according to an embodiment of this application;
[0028] Figure 3 A flowchart illustrating a hyper-threading scheduling method provided in an embodiment of this application is shown.
[0029] Figure 4 This illustration shows a schematic diagram of a hyper-threaded scheduling process for load balancing provided in an embodiment of this application;
[0030] Figure 5 A flowchart illustrating a hyper-threading testing method provided in an embodiment of this application is shown.
[0031] Figure 6 This illustration shows a grouping diagram after a hyper-threaded clustering operation, as provided in an embodiment of this application.
[0032] Figure 7 This paper shows a schematic diagram of the structure of a hyper-threading scheduling device 400 provided in an embodiment of this application;
[0033] Figure 8A schematic diagram of a computing device 100 hardware provided in an embodiment of this application is shown;
[0034] Figure 9 A schematic diagram of a computing device cluster provided in an embodiment of this application is shown;
[0035] Figure 10 The diagram illustrates a possible implementation of a network connection according to an embodiment of this application. Detailed Implementation
[0036] The technical solutions of the embodiments of this application will be described below with reference to the accompanying drawings. In the description of this application, unless otherwise stated, " / " indicates that the objects before and after are in an "or" relationship. For example, A / B can represent A or B. "And / or" in this application is merely a description of the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent: A alone, A and B simultaneously, and B alone, where A and B can be singular or plural. Furthermore, in the description of this application, unless otherwise stated, "multiple" refers to two or more. "At least one of the following" or similar expressions refer to any combination of these items, including any combination of single or plural items. For example, at least one of a, b, or c can represent: a, b, c, ab, ac, bc, or abc, where a, b, and c can be single or multiple. In addition, in order to clearly describe the technical solutions of the embodiments of this application, the terms "first" and "second" are used in the embodiments of this application to distinguish the same or similar items with basically the same function and effect.
[0037] Those skilled in the art will understand that the terms "first," "second," etc., do not limit the quantity or order of execution, and that "first," "second," etc., are not necessarily different. Furthermore, in some embodiments of this application, words such as "exemplary" or "for example" are used to indicate that something is being described as an example, illustration, or description. Any embodiment or design scheme described as "exemplary" or "for example" in the embodiments of this application should not be construed as being more preferred or advantageous than other embodiments or design schemes. Specifically, the use of words such as "exemplary" or "for example" is intended to present the relevant concepts in a concrete manner for ease of understanding.
[0038] Furthermore, the device architecture and business scenarios described in the embodiments of this application are for the purpose of more clearly illustrating the technical solutions of the embodiments of this application, and do not constitute a limitation on the technical solutions provided in the embodiments of this application. As those skilled in the art will know, with the evolution of device architecture and the emergence of new business scenarios, the technical solutions provided in the embodiments of this application are also applicable to similar technical problems.
[0039] First, the application scenarios of the embodiments of this application will be introduced by way of example.
[0040] Hyper-threading technology was developed based on the computer science field's pursuit of higher computational efficiency and performance. Hyper-threading divides a processor's physical core into multiple logical cores. These logical cores share all the hardware resources of the physical core, such as the arithmetic logic unit (ALU), floating-point unit (FPU), cache, and bus interface. This resource sharing increases processor throughput without requiring additional hardware. Each logical core can execute independent hyper-threads. Each logical core has its own execution context, which includes registers and execution state. When a hyper-thread is paused while waiting for data or other resources, the processor can quickly switch to the execution context of another hyper-thread, allowing the physical core to continue execution. This avoids idle waiting of physical core resources, thereby improving processor performance.
[0041] In related technologies, based on the different levels of utilization of central processing unit (CPU) resources by different tasks, tasks of different types are scheduled to multiple hyperthreads on the same physical core to maximize resource utilization. Figure 1 A flowchart illustrating a hyper-threading scheduling method provided by related technologies is shown. For example... Figure 1 As shown, in step S101, the multiple hyperthreads running in the first processor core are grouped according to the identifiers of the resources used by them to obtain at least one hyperthread group. Specifically, the identifiers of the resources used by the hyperthreads can be I / O-intensive or computer-intensive. An I / O-intensive task means that during the processing of this task, the CPU's computation is always waiting for I / O read / write operations. That is, the CPU's computation can be completed in a short time, but the I / O read / write operations need to wait. A computer-intensive task means that the I / O read / write operations of this task can be completed in a short time, but the CPU's computation needs to wait. In step S102, based on the running state parameters of one or more hyperthread groups in the at least one hyperthread group, the task running on at least one hyperthread is changed.
[0042] The method described above groups equal numbers of I / O-intensive and compute-intensive tasks together. That is, a hyperthread group includes one I / O-intensive task and one compute-intensive task, allowing the physical cores to concurrently process compute-intensive tasks while the physical cores are waiting to process the I / O-intensive tasks. Based on the running status parameters of one or more hyperthread groups, tasks running on at least one hyperthread are scheduled to balance the resources of the physical cores.
[0043] However, since there are many types of resources shared by hyper-threads, the above grouping based on the type of shared resources leads to inaccurate grouping results and low precision; at the same time, it cannot be scheduled in real time, resulting in low efficiency of hyper-thread scheduling.
[0044] In view of this, embodiments of this application provide a method for hyper-thread scheduling. This method is applied to a computing device storing multiple hyper-thread groups, including a first hyper-thread group and a second hyper-thread group. The performance of a first hyper-thread indicated by a first hyper-thread group running on the same physical core is greater than the performance of a hyper-thread indicated by both the first and second hyper-thread groups running on the same physical core. The method includes: obtaining multiple hyper-threads to be scheduled and their identifiers; determining, based on the identifiers, that M hyper-threads belong to the first hyper-thread group and N hyper-threads belong to the second hyper-thread group; where M is a positive integer greater than 1 and N is a positive integer greater than 1; scheduling each of the M hyper-threads to run on at least one first physical core, and scheduling each of the N hyper-threads to run on at least one second physical core; the first and second physical cores are different physical cores. In the above method, since the performance of the first hyperthread indicated by the first hyperthread group and the hyperthread indicated by the first hyperthread group running on the same physical core is greater than the performance of the first hyperthread and the hyperthread indicated by the second hyperthread group running on the same physical core, scheduling hyperthreads belonging to the same group to run on the same physical core can guarantee the performance of hyperthread operation. Furthermore, since the computing device stores multiple hyperthread groups, it does not need to perform real-time grouping calculations on hyperthreads before scheduling, reducing overhead and thus improving the efficiency of hyperthread scheduling.
[0045] Secondly, the system architecture of the embodiments of this application will be described by way of example.
[0046] Figure 2 This illustration shows an operational schematic of a computing device 100 supporting Hyper-Threading technology according to an embodiment of this application. Figure 2As shown, the computing device 100 can be a standard general-purpose server, specifically a blade server, high-density server, rack server, or high-performance server, etc. The computing device 100 includes multiple physical cores 110.
[0047] The physical core 110 may include a CPU, and each physical core 110 supports Hyper-threading technology. The physical core 110 is divided into two logical cores. The logical core may include a logical CPU.
[0048] Physical core 110 runs an operating system (OS). The OS provides the execution environment, supporting the execution of a primary application on each physical core 110. This primary application includes multiple hyperthreads. The OS can schedule these hyperthreads to different logical cores for execution.
[0049] Optionally, the computing device 100 further includes a scheduler 120, which is used to obtain multiple hyperthreads to be scheduled and the identifiers of the multiple hyperthreads; based on the identifiers of the multiple hyperthreads, determine that M hyperthreads belong to a first hyperthread group and N hyperthreads belong to a second hyperthread group; where M is a positive integer greater than 1 and N is a positive integer greater than 1; schedule each of the M hyperthreads to run on at least one first physical core, and schedule each of the N hyperthreads to run on at least one second physical core; the first physical core and the second physical core are different physical cores.
[0050] It should be noted that the system architecture and application scenarios described in the embodiments of this application are for the purpose of more clearly illustrating the technical solutions of the embodiments of this application, and do not constitute a limitation on the technical solutions provided in the embodiments of this application. For example, the scheduler 120 can also be integrated into the task scheduling unit of the first application. As those skilled in the art will know, with the evolution of system architecture and the emergence of new business scenarios, the technical solutions provided in the embodiments of this application are also applicable to similar technical problems.
[0051] For ease of understanding, the hyper-threading scheduling method provided in this application is described below with reference to the accompanying drawings. This hyper-threading scheduling method is applicable to... Figure 2 The computing device 100 shown.
[0052] Figure 3 A flowchart illustrating a hyper-threading scheduling method provided in an embodiment of this application is shown. Figure 3 As shown, this hyper-threading scheduling method includes the following steps:
[0053] S301, The computing device obtains multiple hyperthreads to be scheduled and the identifiers of the multiple hyperthreads.
[0054] In this embodiment, the computing device stores multiple thread groups, including a first thread group and a second thread group. The first thread group and the second thread group are any two of the multiple thread groups. The performance of the first hyperthread indicated by the first hyperthread group running on the same physical core as the hyperthread indicated by the first hyperthread group is greater than the performance of the hyperthread indicated by the first hyperthread group and the hyperthread indicated by the second hyperthread group running on the same physical core. Similarly, the performance of the second hyperthread indicated by the second hyperthread group running on the same physical core as the hyperthread indicated by the second hyperthread group is greater than the performance of the second hyperthread and the hyperthread indicated by the first hyperthread group running on the same physical core; wherein the second hyperthread is any one of the hyperthreads indicated by the second hyperthread group.
[0055] The reason for defining a first hyperthreading group and a second hyperthreading group in this application embodiment is to distinguish between different hyperthreading groups among multiple hyperthreading groups. A first hyperthread is any hyperthread within the first hyperthreading group, and a second hyperthread is any hyperthread within the second hyperthreading group. The reason for defining a first hyperthread and a second hyperthread in this application embodiment is to distinguish between hyperthreads in different hyperthreading groups (i.e., the first hyperthreading group and the second hyperthreading group).
[0056] Optionally, the identifier for a hyperthread includes a thread number, and the computing device can obtain the individual thread numbers of the multiple hyperthreads to be scheduled. There is a mapping relationship between the thread numbers corresponding to the multiple hyperthreads and the identifiers of the multiple hyperthread groups.
[0057] S302, the computing device determines, based on the identifiers of multiple hyperthreads, that M hyperthreads belong to the first hyperthread group and N hyperthreads belong to the second hyperthread group.
[0058] Where M is a positive integer greater than 1, and N is a positive integer greater than 1.
[0059] For example, the computing device determines the hyperthreading group to which each hyperthread belongs based on the thread IDs of the multiple hyperthreads and the mapping relationship between the thread IDs of the multiple hyperthreads and the identifiers of the multiple thread groups. This determines that M hyperthreads belong to the first hyperthreading group and N hyperthreads belong to the second hyperthreading group.
[0060] For example, a computing device stores multiple hyper-thread groups, designated as the first hyper-thread group, the second hyper-thread group, and the third hyper-thread group. The identifier for the first hyper-thread group is Z001, the identifier for the second hyper-thread group is Z002, and the identifier for the third hyper-thread group is Z003. Each hyper-thread group contains multiple hyper-threads, and their respective thread numbers are mapped to the identifiers of the hyper-thread groups. Specifically, the first hyper-thread group Z001 includes hyper-threads X1, X2, and X3, so thread numbers X1, X2, and X3 are mapped to Z001. The second hyper-thread group Z002 includes hyper-threads X4, X5, and X6, so thread numbers X4, X5, and X6 are mapped to Z002. If the obtained thread numbers of multiple hyper-threads are X1, X2, and X6, it can be determined that hyper-threads X1 and X2 belong to the first hyper-thread group, and hyper-thread X6 belongs to the second hyper-thread group.
[0061] S303, the computing device schedules each of the M hyperthreads to run on at least one first physical core and schedules each of the N hyperthreads to run on at least one second physical core.
[0062] Among them, the first physical nucleus and the second physical nucleus are different physical nuclei.
[0063] In this embodiment, the computing device determines multiple physical core groups based on the number of hyperthreads included in each of the multiple hyperthreading groups. These multiple physical core groups include a first physical core group and a second physical core group. The first physical core group includes multiple first physical cores, and the second physical core group includes multiple second physical cores. The first and second physical core groups are defined in this embodiment to distinguish physical cores within different physical core groups (i.e., the first and second physical core groups). The first and second physical core groups are any two physical core groups among the multiple physical core groups.
[0064] In one possible implementation, the computing device includes a first physical core group and a second physical core group, the first physical core group including a first physical core; the second physical core group including a second physical core; the computing device obtains the average utilization rate of the first physical core group and the average utilization rate of the second physical core group; if the average utilization rate of the first physical core group is higher than the average utilization rate of the second physical core group; the idle second physical cores in the second physical core group are allocated to the first physical core group.
[0065] The average occupancy rate is used to indicate the ratio of the sum of the occupancy rates of all physical cores in the physical core group to the number of physical cores.
[0066] for example, Figure 4This illustration shows a schematic diagram of a hyper-threaded scheduling process for load balancing provided in an embodiment of this application. Figure 4 As shown, the number of physical core groups in the computing device 100 is determined based on the number of hyper-threading groups included in the computing device 100. Taking a hyper-threading group comprising a first hyper-threading group and a second hyper-threading group as an example, the computing device 100 includes a first physical core group and a second physical core group. The first physical core group includes multiple first physical cores, and the second physical core group includes multiple second physical cores. The number of first physical cores is the same as the number of second physical cores. The first physical core group is used to run the hyper-threads in the first hyper-threading group, and the second physical core group is used to run the hyper-threads in the second hyper-threading group.
[0067] Optionally, the computing device 100 determines the physical core groups in the computing device 100 based on the number of hyper-threading groups among the multiple hyper-threading groups. Specifically, physical core groups that execute hyper-threading groups with a large number of hyper-threads are allocated a larger number of physical cores; physical core groups that execute hyper-threading groups with a small number of hyper-threads are allocated a smaller number of physical cores. The specific allocation method of physical core groups and the method for determining the number of physical cores in each physical core group are not limited.
[0068] During hyper-thread scheduling, the average utilization rate of the first physical core group and the average utilization rate of the second physical core group are obtained. The average utilization rate indicates the ratio of the sum of the utilization rates of all physical cores in the physical core group to the number of physical cores. If the average utilization rate of the first physical core group is higher than that of the second physical core group, the idle second physical cores in the second physical core group are allocated to the first physical core group. Optionally, if the average utilization rate of the first physical core group is higher than that of the second physical core group, and there are no idle second physical cores in the second physical core group, the second hyper-threads on the second physical cores with low utilization rates in the second physical core group are scheduled to other second physical cores, thereby scheduling the idle second physical cores to the first physical core group. This load balancing between physical core groups further improves the efficiency of hyper-thread scheduling.
[0069] In one possible implementation, the second hyperthread among the M hyperthreads is scheduled to the first logical core, and the third hyperthread among the M hyperthreads is scheduled to the second logical core; the utilization rate of the first logical core and the utilization rate of the second logical core are obtained; if the utilization rate of the first logical core is higher than the utilization rate of the second logical core, the fourth hyperthread among the M hyperthreads is scheduled to the second logical core.
[0070] Among them, the second hyperthread, the third hyperthread, and the fourth hyperthread are different hyperthreads; that is, the second hyperthread, the third hyperthread, and the fourth hyperthread represent any number of different types of hyperthreads.
[0071] For example, since the performance of the first hyperthread indicated by the first hyperthreading group running on the same physical core is greater than the performance of the first hyperthread and the hyperthread indicated by the second hyperthreading group running on the same physical core, the computing device schedules M hyperthreads belonging to the first hyperthreading group to at least one first physical core in the first physical core group. That is, it ensures that multiple hyperthreads running on the same physical core all belong to the same hyperthreading group. Before scheduling the fourth hyperthread among the M hyperthreads, based on the utilization rates of the first and second logical cores, it is determined that the fourth hyperthread will be scheduled to a logical core with a low utilization rate, thus achieving load balancing among the logical cores in the first physical core.
[0072] For example, taking the first physical core as an example, the first physical core includes a first logical core and a second logical core. Among the multiple hyperthreads to be scheduled, hyperthreads X0, X1, X2, X8, and X9 belong to the first hyperthread group. Hyperthreads X0 and X1 are scheduled to the first logical core, and hyperthreads X2 and X8 are scheduled to the second logical core. Before scheduling hyperthread X9, the utilization rates of the first and second logical cores are obtained. If the utilization rates of the first and second logical cores are 85% and 63% respectively, hyperthread X9 is scheduled to the second logical core.
[0073] Since the performance of the first hyperthread indicated by the first hyperthreading group running on the same physical core is greater than the performance of the first hyperthread and the hyperthread indicated by the second hyperthreading group running on the same physical core; that is, the performance of any number of hyperthreads in the first hyperthreading group running on the same physical core is greater than the performance of hyperthreads in the first hyperthreading group running on the same physical core with hyperthreads from other groups. Therefore, scheduling other hyperthreads in the same hyperthreading group to the second logical core will not affect the performance of the first physical core; at the same time, it can also balance the utilization of logical cores on the first physical core, achieving load balancing of logical cores.
[0074] This application also provides a hyper-threading testing method for determining multiple hyper-threading groups. This method is executed by a computing device, and is described exemplarily below with reference to the accompanying drawings.
[0075] Figure 5 A flowchart illustrating a hyper-threading testing method provided in an embodiment of this application is shown. Figure 5 As shown, this hyper-threading testing method includes the following steps:
[0076] S501, The computing device obtains the execution efficiency of each group in multiple test groups.
[0077] The test groups include multiple hyperthreads to be tested, and the identifiers of the multiple hyperthreads to be tested are the same as those of the multiple hyperthreads.
[0078] In this embodiment, the hyperthreads in the multiple test groups and the hyperthreads in the multiple hyperthread groups stored on the computing device are hyperthreads from the same application.
[0079] In one possible implementation, before acquiring the multiple hyperthreads to be scheduled and their identifiers, the computing device determines multiple test groups based on the stored multiple hyperthreads.
[0080] Multiple test groups include multiple hyperthreads to be tested. Each test group includes at least two hyperthreads.
[0081] For example, the first application is typically used to provide network services and is computer software designed to perform specific tasks. The first application includes multiple hyperthreads, which are the smallest units that an operating system (OS) can run, and are included in the processes during the actual execution of the first application. Different hyperthreads are used to implement different subtasks within the first application, cooperating to enable the first application to support specific functions. For example, the first application could be a word processing application used for writing and editing documents. In a document processing process, one hyperthread is responsible for receiving user keyboard input; another hyperthread is responsible for displaying the text on the screen. All hyperthreads of the first application are grouped using a combination method to obtain multiple test groups, with each test group containing the same number of hyperthreads. The number of hyperthreads included in each test group is typically related to the number of logical cores divided within the third physical core.
[0082] For example, consider a test group comprising two different hyperthreads to be tested. If the first application includes five hyperthreads: T1, T2, T3, T4, and T5, and the third physical core includes two logical cores, the five hyperthreads in the first application are combined so that each test group includes two different hyperthreads to be tested. That is, two different hyperthreads are selected from the five, resulting in multiple test groups. The combination formula is as follows:
[0083] C(5,2)=5! / 2! (5-2)! =10
[0084] Therefore, the five hyperthreads included in the first application can be combined into 10 test groups, namely hyperthread T1 and hyperthread T2, hyperthread T1 and hyperthread T3, hyperthread T1 and hyperthread T4, hyperthread T1 and hyperthread T5, hyperthread T2 and hyperthread T3, hyperthread T2 and hyperthread T4, hyperthread T2 and hyperthread T5, hyperthread T3 and hyperthread T4, hyperthread T3 and hyperthread T5, and hyperthread T4 and hyperthread T5.
[0085] In one possible implementation, when multiple hyperthreads to be tested in the target test group are running simultaneously on the third physical core, the parallel cycle instruction count (IPC) of each of the multiple hyperthreads to be tested in the target test group is obtained to determine a first target number; when multiple hyperthreads to be tested in the target test group are running individually on the third physical core, the cycle instruction count (IPC) of each of the multiple hyperthreads to be tested in the target test group is obtained to determine a second target number; based on the first number and the second number, the execution efficiency of the target test group is determined as the ratio of the first target number to the second target number.
[0086] The target test group is any one of the multiple test groups.
[0087] In this embodiment of the application, the computing device can perform performance tests on each test group while the first application is offline, and obtain the execution efficiency of multiple test groups.
[0088] For example, multiple hyperthreads to be tested in the target test group are run simultaneously on a third physical core. The parallel cycle instruction count (IPC) of each hyperthread to be tested in the test group is recorded. The first quantity is determined based on the sum of the parallel cycle instruction count (IPC) of each hyperthread to be tested. Then, the multiple hyperthreads to be tested in the target test group are run individually on the third physical core. The second quantity is determined based on the sum of the cycle instruction count (IPC) of each hyperthread to be tested, thereby determining the execution efficiency of the target test group.
[0089] Instructions per cycle (IPC) refers to the number of instructions executed by the CPU per clock cycle. For example, if a hyperthread or process executes a total of 1000 instructions during its execution, and the CPU consumes 200 clock cycles to execute these instructions, then the IPC of that hyperthread or process is 1000 / 200 = 5.
[0090] For example, if the first application includes 5 hyperthreads, namely hyperthread T1, hyperthread T2, hyperthread T3, hyperthread T4, and hyperthread T5. Based on the number of logical cores to which the third physical core is divided, the 5 hyperthreads in the first application are divided into 10 test groups, each test group including two different hyperthreads to be tested. The test groups are: hyperthread T1 and hyperthread T2, hyperthread T1 and hyperthread T3, hyperthread T1 and hyperthread T4, hyperthread T1 and hyperthread T5, hyperthread T2 and hyperthread T3, hyperthread T2 and hyperthread T4, hyperthread T2 and hyperthread T5, hyperthread T3 and hyperthread T4, hyperthread T3 and hyperthread T5, and hyperthread T4 and hyperthread T5.
[0091] If the target test group consists of Hyper-Threading T1 and Hyper-Threading T2, the computing device will run Hyper-Threading T1 and Hyper-Threading T2 simultaneously on the third physical core, ensuring that the utilization rate of the third physical core is 100%. The parallel cycle instruction count (IPC) of Hyper-Threading T1 is recorded as 15, and the parallel cycle instruction count (IPC) of Hyper-Threading T2 is recorded as 5. The computing device will then run Hyper-Threading T1 alone on the third physical core, recording the cycle instruction count (IPC) of Hyper-Threading T1 as 20. The computing device will also run Hyper-Threading T2 alone on the third physical core, recording the cycle instruction count (IPC) of Hyper-Threading T2 as 10. Based on the above parallel cycle instruction count (IPC) and cycle instruction count (IPC) of parallel Hyper-Threading T1 and Hyper-Threading T2, the execution efficiency of the target test group is determined to be [(15+5) / (20+10)]*100% = 66.67%.
[0092] As can be seen from the above, conducting performance tests on each test group of the first application in an offline state and identifying multiple thread groups can reduce the computational overhead in the hyper-threading scheduling process, achieve real-time hyper-threading scheduling, and thus improve the efficiency of hyper-threading scheduling.
[0093] S502. The computing device performs a clustering operation on the execution efficiency of each test group to determine multiple hyper-threading groups; wherein each hyper-threading group includes at least one test group.
[0094] For example, the purpose of clustering is to cluster hyperthreads in multiple test groups in a computing device into multiple hyperthread groups. If multiple hyperthread groups include a first hyperthread group and a second hyperthread group, then the performance of the first hyperthread indicated by the first hyperthread group and the hyperthread indicated by the first hyperthread group running on the same physical core is greater than the performance of the hyperthread indicated by the first hyperthread group and the hyperthread indicated by the second hyperthread group running on the same physical core. Therefore, it is possible to achieve high affinity within hyperthread groups and low affinity between groups. Clustering operations can include K-means clustering and / or spectral clustering algorithms. Clustering can be performed on test groups composed of all hyperthreads in the application, or on test groups composed of hyperthreads that run frequently in the application; this is not a limitation.
[0095] For example, taking the K-means clustering algorithm, if the computing device includes 9 hyperthreads, namely hyperthread T1, hyperthread T2, hyperthread T3, hyperthread T4, hyperthread T5, hyperthread T6, hyperthread T7, hyperthread T8, and hyperthread T9, each test group includes 2 different hyperthreads based on the number of logical cores divided by the third physical core. These 9 hyperthread combinations are then divided into C(9,2) = 36 hyperthread clusters, with each test group including two different hyperthreads to be tested.
[0096] Based on the execution efficiency formula above, the execution efficiency of each test group can be obtained. Clustering is then performed based on the execution efficiency. Figure 6 This illustration shows a grouping diagram following a hyper-threaded clustering operation, as provided in an embodiment of this application. Figure 6 As shown, for hyperthreads T1, T2, T3, T4, T5, T6, T7, T8, and T9, the distance between any two points is inversely proportional to their execution efficiency (i.e., L = 1 - execution efficiency). In other words, the higher the execution efficiency, the smaller the distance. Therefore, k-means clustering can be performed based on the distance between each pair of hyperthreads. According to the k-means clustering algorithm, the centroid of each hyperthread group is determined, the distance from each hyperthread point to the centroid is calculated, and the hyperthread point closest to the centroid is assigned to the hyperthread group containing that centroid. The centroid of each hyperthread group is redefined, and the above steps are repeated until each group remains unchanged after iteration. The final grouping result is shown below. Figure 6 As shown, the first hyper-threading group includes hyper-threads T1, T4, T7, and T9; the second hyper-threading group includes target hyper-threading group hyper-threads T2, T5, and T8; and the third hyper-threading group includes hyper-threads T3 and T6. The execution efficiency among the hyper-threads within a group is maximized, meaning that any number of hyper-threads within a group can run on the same physical core.
[0097] Optionally, if a spectral clustering algorithm is used, the weight of the edge connection between two hyperthreaded points is the execution efficiency. The higher the execution efficiency, the higher the weight, and the easier it is to be assigned to a group. In this case, the execution efficiency corresponding to each test group is the adjacency matrix in the spectral clustering, and spectral clustering can then be performed based on this adjacency matrix.
[0098] It should be noted that the clustering operations corresponding to the K-means clustering algorithm and spectral clustering algorithm mentioned above are only examples and do not constitute specific limitations.
[0099] In summary, this application provides a hyper-threading scheduling method to ensure the accuracy of hyper-thread grouping, reduce real-time hyper-threading scheduling overhead, and thus improve the efficiency of hyper-threading scheduling. This method is applied to a computing device that stores multiple hyper-thread groups, including a first hyper-thread group and a second hyper-thread group. The performance of the first hyper-thread indicated by the first hyper-thread group running on the same physical core is greater than the performance of the hyper-thread indicated by the first hyper-thread group and the hyper-thread indicated by the second hyper-thread group running on the same physical core. The method includes: retrieving multiple hyper-threads to be scheduled and their identifiers; determining, based on the identifiers, that M hyper-threads belong to the first hyper-thread group and N hyper-threads belong to the second hyper-thread group; where M is a positive integer greater than 1 and N is a positive integer greater than 1; scheduling each of the M hyper-threads to run on at least one first physical core and scheduling each of the N hyper-threads to run on at least one second physical core; the first physical core and the second physical core are different physical cores. In the above method, since the performance of the first hyperthread indicated by the first hyperthread group and the hyperthread indicated by the first hyperthread group running on the same physical core is greater than the performance of the hyperthread indicated by the first hyperthread group and the hyperthread indicated by the second hyperthread group running on the same physical core, scheduling hyperthreads belonging to the same group to run on the same physical core can guarantee the performance of hyperthread operation. Furthermore, since the computing device stores multiple hyperthread groups, there is no need for the computing device to perform real-time grouping calculations before scheduling hyperthreads, which reduces overhead and thus improves the efficiency of hyperthread scheduling.
[0100] The foregoing mainly describes the solutions of the embodiments of this application from a methodological perspective. It is understood that, in order to implement the functions in the above-described hyper-threading scheduling method, the hyper-threading scheduling device includes at least one of the hardware structures and software modules corresponding to each function. Those skilled in the art should readily recognize that, in conjunction with the units and algorithm steps of the various examples described in the embodiments disclosed herein, the embodiments of this application can be implemented in hardware or a combination of hardware and computer software. Whether a function is executed by hardware or by computer software driving hardware depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of the embodiments of this application.
[0101] This application embodiment can divide the hyper-threading scheduling device into functional units based on the above method example. For example, each function can be divided into its own functional unit, or two or more functions can be integrated into one processing unit. The integrated unit can be implemented in hardware or as a software functional unit. It should be noted that the unit division in this application embodiment is illustrative and only represents one logical functional division; other division methods may be used in actual implementation.
[0102] For example, Figure 7 A schematic diagram of the structure of a hyper-threading scheduling device 400 provided in an embodiment of this application is shown. Figure 7 As shown, the hyper-threading scheduler can be applied in computing device 100, and the hyper-threading scheduler 400 includes:
[0103] The acquisition module 401 is used to acquire multiple hyperthreads to be scheduled and the identifiers of the multiple hyperthreads;
[0104] The judgment module 402 is used to determine, based on the identifiers of multiple hyperthreads, that M hyperthreads belong to the first hyperthread group and N hyperthreads belong to the second hyperthread group; where M is a positive integer greater than 1 and N is a positive integer greater than 1.
[0105] The scheduling module 403 is used to schedule each of the M hyperthreads to run on at least one first physical core, and to schedule each of the N hyperthreads to run on at least one second physical core; the first physical core and the second physical core are different physical cores.
[0106] In one possible implementation, the scheduling module 403 is further configured to schedule the second hyperthread among the M hyperthreads to the first logical core, and schedule the third hyperthread among the M hyperthreads to the second logical core; the scheduling module is further configured to obtain the utilization rate of the first logical core and the utilization rate of the second logical core; the scheduling module is further configured to, if the utilization rate of the first logical core is higher than the utilization rate of the second logical core, schedule the fourth hyperthread among the M hyperthreads to the second logical core.
[0107] In one possible implementation, the scheduling module 403 is further configured to obtain the average occupancy rate of the first physical core group and the average occupancy rate of the first physical core group; the average occupancy rate is used to indicate the ratio of the sum of the occupancy rates of each physical core in the physical core group to the number of physical cores; if the average occupancy rate of the first physical core group is higher than the average occupancy rate of the second physical core group, the idle second physical cores in the second physical core group are allocated to the first physical core group.
[0108] In one possible implementation, the hyper-threading scheduler 400 further includes a testing module for obtaining the execution efficiency of each test group in a plurality of test groups; the execution efficiency is used to indicate the ratio of a first quantity to a second quantity; the first quantity is the sum of the parallel cycle instruction count (IPC) of each of the multiple hyper-threads under test in the test group when they run simultaneously on a third physical core; the second quantity is the sum of the cycle instruction count (IPC) of each of the multiple hyper-threads under test in the test group when they run individually on a third physical core; the plurality of test groups include a plurality of hyper-threads under test, and the identifiers of the plurality of hyper-threads under test are the same as the identifiers of the plurality of hyper-threads;
[0109] The testing module is also used to perform clustering operations on the execution efficiency of each test group to identify multiple hyper-threading groups; wherein each hyper-threading group includes at least one test group.
[0110] In one possible implementation, the testing module is further configured to: obtain the parallel cycle instruction count (IPC) of each of the multiple hyperthreads under test in the target test group while they are running simultaneously on the third physical core, and determine a first target number; obtain the cycle instruction count (IPC) of each of the multiple hyperthreads under test in the target test group while they are running individually on the third physical core, and determine a second target number; and determine the execution efficiency of the target test group as the ratio of the first target number to the second target number based on the first and second targets. Here, the target test group is any one of multiple test groups.
[0111] As can be seen from the above, the aforementioned hyper-threading scheduling device 400 can be applied to... Figure 2 The above is implemented in the computing device 100 shown. Figure 3 The hyper-threading scheduling method in [the context of the text].
[0112] The acquisition module 401, judgment module 402, and scheduling module 403 can all be implemented in software or hardware. For example, the implementation of the acquisition module 401 will be described below. Similarly, the implementation of the judgment module 402 and scheduling module 403 can refer to the implementation of the acquisition module 401.
[0113] As an example of a software functional unit, the acquisition module 401 may include code running on a computing instance. The computing instance may include at least one of a physical host (computing device), a virtual machine, or a container. Further, the aforementioned computing instance may be one or more. For example, the acquisition module 401 may include code running on multiple hosts / virtual machines / containers. It should be noted that the multiple hosts / virtual machines / containers used to run the code may be distributed in the same region or in different regions. Further, the multiple hosts / virtual machines / containers used to run the code may be distributed in the same availability zone (AZ) or in different AZs, each AZ including one or more geographically proximate data centers. Typically, a region may include multiple AZs.
[0114] Similarly, multiple hosts / virtual machines / containers used to run this code can be distributed within the same Virtual Private Cloud (VPC) or across multiple VPCs. Typically, a VPC is set up within a region. Communication between two VPCs within the same region, as well as between VPCs in different regions, requires a communication gateway to be set up within each VPC to enable interconnection between VPCs.
[0115] As an example of a hardware functional unit, the acquisition module 401 may include at least one computing device, such as a server. Alternatively, the acquisition module 401 may also be a device implemented using an application-specific integrated circuit (ASIC) or a programmable logic device (PLD). The PLD may be implemented using a complex programmable logical device (CPLD), a field-programmable gate array (FPGA), generic array logic (GAL), or any combination thereof.
[0116] The multiple computing devices included in the acquisition module 401 can be distributed in the same region or in different regions. Similarly, the multiple computing devices included in the acquisition module 401 can be distributed in the same Availability Zone (AZ) or in different AZs. Likewise, the multiple computing devices included in the acquisition module 401 can be distributed in the same Virtual Private Cloud (VPC) or in multiple VPCs. These multiple computing devices can be any combination of computing devices such as servers, ASICs, PLDs, CPLDs, FPGAs, and GALs.
[0117] It should be noted that, in other embodiments, the acquisition module 401 can be used to execute any step in the hyper-threading scheduling method, and the judgment module 402 and the scheduling module 403 can be used to execute any step in the hyper-threading scheduling method. The steps implemented by the acquisition module 401, the judgment module 402 and the scheduling module 403 can be specified as needed. By implementing different steps in the hyper-threading scheduling method through the acquisition module 401, the judgment module 402 and the scheduling module 403 respectively, all functions of the hyper-threading scheduling device 400 can be realized.
[0118] Figure 8 A schematic diagram of a computing device 100 hardware provided in an embodiment of this application is shown. For example... Figure 8 As shown, the computing device 100 includes a bus 901, a processor 902, a memory 903, and a communication interface 904. The processor 902, the memory 903, and the communication interface 904 communicate with each other via the bus 901. It should be understood that this application does not limit the number of processors and memories in the computing device 100.
[0119] The 901 bus can be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus, etc. Buses can be categorized as address buses, data buses, control buses, etc. For ease of representation, Figure 8 The bus 901 may be represented by a single line, but this does not mean that there is only one bus or one type of bus. The bus 901 may include a path for transmitting information between various components of the computing device 100 (e.g., processor 902, memory 903, communication interface 904).
[0120] Processor 902 may include any one or more processors such as a central processing unit (CPU), a graphics processing unit (GPU), a microprocessor (MP), or a digital signal processor (DSP).
[0121] Memory 903 may include volatile memory, such as random access memory (RAM). Processor 902 may also include non-volatile memory, such as read-only memory (ROM), flash memory, hard disk drive (HDD), or solid state drive (SSD).
[0122] The memory 903 stores executable program code, which the processor 902 executes to implement the functions of the acquisition module, the judgment module, and the scheduling module, thereby implementing the hyper-threading scheduling method. In other words, the memory 903 stores instructions for performing hyper-threading scheduling.
[0123] Alternatively, the memory 903 stores executable code, which the processor 902 executes to implement the functions of the aforementioned hyper-threading scheduling device 400, thereby implementing the hyper-threading scheduling method. That is, the memory 903 stores instructions for executing the hyper-threading scheduling method.
[0124] The communication interface 904 uses transceiver modules such as, but not limited to, network interface cards and transceivers to enable communication between the computing device 100 and other devices or communication networks.
[0125] As an example, combined Figure 7 The acquisition module 401, judgment module 402, and scheduling module 403 of the hyper-threaded scheduling device 400 can perform some or all of their functions through... Figure 8 The computing device 100 in the middle is implemented.
[0126] This application also provides a computing device cluster. The computing device cluster includes at least one computing device, such as a central server, an edge server, or a local server in a local data center. In some embodiments, the computing device may also be a terminal device such as a desktop computer, a laptop computer, or a smartphone.
[0127] Figure 9A schematic diagram of a computing device cluster provided in an embodiment of this application is shown. Figure 9 As shown, the computing device cluster includes at least one computing device 100. The memory 903 of one or more computing devices 100 in the computing device cluster may store the same instructions for executing hyper-threaded scheduling methods.
[0128] In some possible implementations, the memory 903 of one or more computing devices 100 in the computing device cluster may also store partial instructions for implementing the hyper-threading scheduling method. In other words, a combination of one or more computing devices 100 can jointly execute instructions for the hyper-threading scheduling method.
[0129] It should be noted that the memory 903 in different computing devices 100 within the computing device cluster can store different instructions, each used for a portion of the functions of the hyper-threading scheduler 400. That is, the instructions stored in the memory 903 of different computing devices 100 can implement the functions of one or more modules among the acquisition module 401, the judgment module 402, and the scheduling module 403.
[0130] In some possible implementations, one or more computing devices in a computing device cluster can be connected via a network. This network can be a wide area network (WAN), a local area network (LAN), or similar. Figure 10 This diagram illustrates a possible implementation of a network connection according to an embodiment of this application. Figure 10 As shown, computing devices 100A and 100B are connected via a network. Specifically, they are connected to the network through communication interfaces in each computing device. In this possible implementation, the memory 903 in computing device 100A stores instructions for executing the functions of the acquisition module 401 and the judgment module 402. Simultaneously, the memory 903 in computing device 100B stores instructions for executing the functions of the scheduling module 403.
[0131] Figure 10 The connection method between the computing device clusters shown can be such that, considering the need of the hyper-threading scheduling method provided in this application (e.g., storing a large amount of data), the functions implemented by the scheduling module 403 are delegated to the computing device 100B.
[0132] It should be understood that Figure 10 The functions of the computing device 100A shown can also be performed by multiple computing devices 100. Similarly, the functions of the computing device 100B can also be performed by multiple computing devices 100.
[0133] This application also provides another computing device cluster. The connection relationships between the computing devices in this computing device cluster can be similarly referred to... Figure 9 and Figure 10 The connection method of the computing device cluster shown is different in that the memory 903 of one or more computing devices 100 in this computing device cluster can store the same instructions for executing the hyper-threading scheduling method.
[0134] In some possible implementations, the memory 903 of one or more computing devices 100 in the computing device cluster may also store partial instructions for executing the hyper-threading scheduling method. In other words, a combination of one or more computing devices 100 can jointly execute the instructions for executing the hyper-threading scheduling method.
[0135] It should be noted that the memory 903 in different computing devices 100 within the computing device cluster can store different instructions for executing some functions of the hyper-threading scheduler 400. That is, the instructions stored in the memory 903 of different computing devices 100 can implement the functions of the hyper-threading scheduler 400.
[0136] This application also provides a computer-readable storage medium storing instructions that, when executed on a computer, cause the computer to perform operations corresponding to any implementation scheme and various feasible implementation schemes of the hyper-threading scheduling method.
[0137] This application also provides a computer program product containing instructions that, when run on a computer, cause the computer to perform operations corresponding to any implementation scheme and various feasible implementation schemes of the hyper-threading scheduling method.
[0138] Those skilled in the art will recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this application.
[0139] Those skilled in the art will understand that, for the sake of convenience and brevity, the specific working processes of the systems, devices, and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here.
[0140] This application also provides a chip system, including: a processor coupled to a memory, the memory being used to store programs or instructions, and when the program or instructions are executed by the processor, the chip system enables the methods in any of the above method embodiments.
[0141] Optionally, the chip system may contain one or more processors. These processors can be implemented in hardware or software. When implemented in hardware, the processor can be a logic circuit, an integrated circuit, etc. When implemented in software, the processor can be a general-purpose processor, implemented by reading software code stored in memory.
[0142] Optionally, the chip system may contain one or more memories. The memory may be integrated with the processor or disposed separately from it; this application embodiment does not limit this. For example, the memory may be a non-transient processor, such as a read-only memory (ROM), which may be integrated with the processor on the same chip or disposed separately on different chips. This application embodiment does not specifically limit the type of memory or the arrangement of the memory and processor.
[0143] For example, the chip system can be a field programmable gate array (FPGA), an application-specific integrated circuit (ASIC), a system-on-chip (SoC), a central processing unit (CPU), a network processor (NP), a digital signal processor (DSP), a micro controller unit (MCU), a programmable logic device (PLD), or other integrated chips.
[0144] The electronic devices, computer storage media, or computer program products provided in this application are all used to execute the corresponding methods provided above. Therefore, the beneficial effects they can achieve can be referred to the beneficial effects in the corresponding methods provided above, and will not be repeated here.
[0145] Through the above description of the embodiments, those skilled in the art can clearly understand that, for the sake of convenience and brevity, only the division of the above functional modules is used as an example. In actual applications, the above functions can be assigned to different functional modules as needed, that is, the internal structure of the device can be divided into different functional modules to complete all or part of the functions described above.
[0146] In the several embodiments provided in this application, it should be understood that the disclosed apparatus and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of modules or units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another apparatus, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection between apparatuses or units may be electrical, mechanical, or other forms.
[0147] The units described as separate components may or may not be physically separate. A component shown as a unit can be one or more physical units, located in one place or distributed in multiple different locations. Some or all of the units can be selected to achieve the purpose of this embodiment, depending on actual needs.
[0148] Furthermore, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.
[0149] If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a readable storage medium. Based on this understanding, the technical solutions of the embodiments of this application, in essence, or the contributing parts, or all or part of the technical solutions, can be embodied in the form of a software product. This software product is stored in a storage medium and includes several instructions to cause a device (which may be a microcontroller, chip, etc.) or processor to execute all or part of the steps of the methods of the various embodiments of this application. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0150] The above description is merely a specific embodiment of this application, but the scope of protection of this application is not limited thereto. Any variations or substitutions within the technical scope disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.
Claims
1. A hyper-threading scheduling method, characterized in that, The method is applied to a computing device that stores multiple hyper-threading groups, including a first hyper-threading group and a second hyper-threading group. The performance of a first hyper-thread indicated by the first hyper-threading group running on the same physical core is greater than the performance of a hyper-thread indicated by both the first and second hyper-threading groups running on the same physical core. Obtain the multiple hyperthreads to be scheduled and their identifiers; Based on the identifiers of the multiple hyperthreads, determine that M hyperthreads belong to the first hyperthread group and N hyperthreads belong to the second hyperthread group; where M is a positive integer greater than 1 and N is a positive integer greater than 1. Each of the M hyperthreads is scheduled to run on at least one first physical core, and each of the N hyperthreads is scheduled to run on at least one second physical core; the first physical core and the second physical core are different physical cores.
2. The method according to claim 1, characterized in that, The first physical core includes a first logical core and a second logical core. The step of scheduling each of the M hyperthreads to run on at least one first physical core includes: Schedule the second hyperthread out of M hyperthreads to the first logical core, and schedule the third hyperthread out of M hyperthreads to the second logical core; Obtain the utilization rate of the first logic core and the utilization rate of the second logic core; If the utilization rate of the first logical core is higher than that of the second logical core, the fourth hyperthread among the M hyperthreads will be scheduled to the second logical core.
3. The method according to claim 1 or 2, characterized in that, The computing device includes a first physical core group and a second physical core group, wherein the first physical core group includes the first physical core; the second physical core group includes the second physical core; the method further includes: Obtain the average occupancy rate of the first physical core group and the average occupancy rate of the first physical core group; the average occupancy rate is used to indicate the ratio of the sum of the occupancy rates of each physical core in the physical core group to the number of physical cores; If the average occupancy rate of the first physical core group is higher than the average occupancy rate of the second physical core group, the idle second physical cores in the second physical core group will be allocated to the first physical core group.
4. The method according to any one of claims 1-3, characterized in that, Before obtaining the multiple hyperthreads to be scheduled and their identifiers, the method further includes: The execution efficiency of each test group in multiple test groups is obtained; the execution efficiency is used to indicate the ratio of a first quantity to a second quantity; the first quantity is the sum of the parallel cycle instruction count (IPC) of each of the multiple hyperthreads under test in the test group when they run simultaneously on the third physical core; the second quantity is the sum of the cycle instruction count (IPC) of each of the multiple hyperthreads under test in the test group when they run individually on the third physical core. The multiple test groups include multiple hyperthreads to be tested, and the identifiers of the multiple hyperthreads to be tested are the same as the identifiers of the multiple hyperthreads. Clustering is performed on the execution efficiency of each test group to determine the plurality of hyper-threading groups; wherein each of the plurality of hyper-threading groups includes at least one of the test groups.
5. The method according to claim 4, characterized in that, The process of obtaining the execution efficiency of each test group in multiple test groups includes: When multiple hyperthreads to be tested in the target test group are running simultaneously on the third physical core, the parallel cycle instruction count (IPC) of each of the multiple hyperthreads to be tested in the target test group is obtained to determine the first target quantity; With multiple hyperthreads to be tested in the target test group running independently on the third physical core, the cycle instruction count (IPC) of each of the multiple hyperthreads to be tested in the target test group is obtained to determine the second target quantity; Based on the first quantity and the second quantity, the execution efficiency of the target test group is determined to be the ratio of the first target quantity to the second target quantity; The target test group is any one of the multiple test groups.
6. A hyper-threading scheduling device, characterized in that, The device includes: The acquisition module is used to acquire multiple hyperthreads to be scheduled and the identifiers of the multiple hyperthreads; The determination module is used to determine, based on the identifiers of the plurality of hyperthreads, that M hyperthreads belong to a first hyperthread group and N hyperthreads belong to a second hyperthread group; where M is a positive integer greater than 1 and N is a positive integer greater than 1. The scheduling module is used to schedule each of the M hyperthreads to run on at least one first physical core, and to schedule each of the N hyperthreads to run on at least one second physical core; the first physical core and the second physical core are different physical cores.
7. The hyper-threading scheduling device according to claim 6, characterized in that, The first physical core includes a first logical core and a second logical core, wherein each of the M hyperthreads is scheduled to run on at least one first physical core. The scheduling module is also used to schedule the second hyperthread among the M hyperthreads to the first logical core, and to schedule the third hyperthread among the M hyperthreads to the second logical core; The scheduling module is also used to obtain the utilization rate of the first logical core and the utilization rate of the second logical core; The scheduling module is further configured to, if the utilization rate of the first logical core is higher than that of the second logical core, schedule the fourth hyperthread among the M hyperthreads to the second logical core.
8. The hyper-threading scheduling device according to claim 6 or 7, characterized in that, The computing device includes a first physical core group and a second physical core group, wherein the first physical core group includes the first physical core; and the second physical core group includes the second physical core. The scheduling module is further configured to obtain the average occupancy rate of the first physical core group and the average occupancy rate of the first physical core group; the average occupancy rate is used to indicate the ratio of the sum of the occupancy rates of each physical core in the physical core group to the number of physical cores; If the average occupancy rate of the first physical core group is higher than the average occupancy rate of the second physical core group, the idle second physical cores in the second physical core group will be allocated to the first physical core group.
9. The hyper-threading scheduling device according to any one of claims 6-8, characterized in that, The hyper-threading scheduling device also includes; The testing module is used to obtain the execution efficiency of each test group in multiple test groups; the execution efficiency is used to indicate the ratio of a first quantity to a second quantity; the first quantity is the sum of the parallel cycle instruction count (IPC) of each of the multiple hyperthreads under test in the test group when they run simultaneously on the third physical core; the second quantity is the sum of the cycle instruction count (IPC) of each of the multiple hyperthreads under test in the test group when they run individually on the third physical core. The multiple test groups include multiple hyperthreads to be tested, and the identifiers of the multiple hyperthreads to be tested are the same as the identifiers of the multiple hyperthreads. The testing module is further configured to perform clustering operations on the execution efficiency of each test group to determine the plurality of hyper-threading groups; wherein each of the plurality of hyper-threading groups includes at least one of the test groups.
10. The hyper-threading scheduling device according to claim 9, characterized in that, The test module is also used to obtain the parallel cycle instruction count (IPC) of each of the multiple hyperthreads to be tested in the target test group when they are running simultaneously on the third physical core, and to determine the first target quantity. The test module is also used to obtain the cycle instruction count (IPC) of each of the multiple hyperthreads to be tested in the target test group when they are running independently on the third physical core, and to determine the second target quantity. The testing module is further configured to determine the execution efficiency of the target test group as the ratio of the first target quantity to the second target quantity based on the first quantity and the second quantity; The target test group is any one of the multiple test groups.
11. A computing device, characterized in that, The computing device includes a processor, a memory, and a computer program / instructions stored in the memory; the processor executes the computer program / instructions to enable the computing device to implement the hyper-threading scheduling method as described in any one of claims 1-5.
12. A computer program product, characterized in that, The computer program product includes instructions that, when executed by a computing device, cause the server to perform the hyper-threading scheduling method as described in any one of claims 1-5.
13. A computer-readable storage medium, characterized in that, The computer-readable storage medium includes computer program instructions that, when executed by a computing device, enable the computing device to perform the hyper-threading scheduling method as described in any one of claims 1-5.