Load balancing method and device, cluster and many-core processor

A technology of many-core processors and load balancing, applied in the computer field, can solve problems such as system performance needs to be improved, thread migration delay is large, etc., and achieve the effect of reducing average waiting time, improving parallelism, and improving system performance.

Inactive Publication Date: 2016-04-27
HANGZHOU HUAWEI DIGITAL TECH
View PDF6 Cites 30 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Due to the need to monitor and schedule through the operating system, the delay of thread migration is relatively large, and the system performance needs to be improved

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Load balancing method and device, cluster and many-core processor
  • Load balancing method and device, cluster and many-core processor
  • Load balancing method and device, cluster and many-core processor

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment 1

[0061] by image 3 The shown application scenario is taken as an example, and the specific embodiment 1 of the present invention is as follows:

[0062] In the embodiment of the present invention, when calculating the load of the processor core, it is necessary to consider the transfer-in thread, the local thread and the interrupt thread of the processor core.

[0063] The first step is to obtain the load of each processor core.

[0064] The router can obtain the stack information of each processor core through the thread stack information collector, obtain the number of transfer-in threads to be executed in each processor core, and thus obtain the load of the transfer-in threads to be executed in each processor core quantity.

[0065] For example, in image 3 In the shown scenario, the router may learn that processor core 1 includes 5 incoming threads, processor core 2 includes 1 incoming thread, processor core n includes 2 incoming threads, and so on.

[0066] The router...

specific Embodiment 2

[0085] by image 3 The shown application scenario is taken as an example, and the specific embodiment 2 of the present invention is as follows:

[0086] In the embodiment of the present invention, when calculating the load of the processor core, only the transfer-in thread and the local thread of the processor core are considered.

[0087] The first step is to obtain the load of each processor core.

[0088] The router can obtain the stack information of each processor core through the thread stack information collector, so as to obtain the number of incoming threads to be executed in each processor core.

[0089] For example, in image 3 In the shown scenario, the router may learn that processor core 1 includes 5 incoming threads, processor core 2 includes 1 incoming thread, processor core n includes 2 incoming threads, and so on.

[0090] In addition, the router obtains the number of local threads to be executed in the processor core according to the Load / Store queue of t...

specific Embodiment 3

[0113] by image 3 The shown application scenario is taken as an example, and the specific embodiment 3 of the present invention is as follows:

[0114] In the embodiment of the present invention, when calculating the load of the processor core, only the transfer-in thread and the local thread of the processor core are considered.

[0115] The first step is to obtain the load of each processor core.

[0116] The router can obtain the stack information of each processor core through the thread stack information collector, so as to obtain the number of incoming threads to be executed in each processor core.

[0117] For example, in image 3 In the shown scenario, the router may learn that processor core 1 includes 5 incoming threads, processor core 2 includes 1 incoming thread, processor core n includes 2 incoming threads, and so on.

[0118] In addition, the router obtains the number of local threads to be executed in the processor core according to the Load / Store queue of t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiments of the invention provide a loading balancing method and device and a cluster. The method is applied to a cluster in a many-core processor. The method comprises the steps of acquiring the load quantity of each processor core in multiple processor cores of the cluster, the load quantity of each processor core being determined according to at least one to-be-executed thread of the processor core; determining a first processor core and a second processor core according to the load quantity of each processor core in the multiple processor cores of the cluster, wherein the first processor core is the processor core for a thread for emigration and the second processor core is the processor core for a thread for immigration; immigrating one or more to-be-executed threads in the first processor core into the second processor core.

Description

technical field [0001] Embodiments of the present invention relate to the computer field, and more specifically, relate to a load balancing method, device, cluster and many-core processors. Background technique [0002] In a traditional on-chip multiprocessor system (on-Chip Multiple Processor System, CMPs) system, when the required data is not in the local storage, the local thread will access the remote node, transport the data back to the local through the on-chip network, and perform data consistency at the same time maintain. The power consumption of the CMPs system is mainly composed of data interaction between nodes and data communication overhead (Traffic) generated by data consistency maintenance. In order to reduce the power consumption of the CMPs system and improve the performance of the CMPs system, when the data required for thread execution is not local, and the thread needs to continuously or frequently access the data, the thread can be migrated to the core...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F15/76G06F9/50
Inventor 李景超
Owner HANGZHOU HUAWEI DIGITAL TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products