Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Mass computing node resource monitoring and management method for high-performance computer

A computing node and resource monitoring technology, applied in computing, resource allocation, program control design, etc., can solve the problems of reducing system performance, increasing the load of control nodes, increasing the width of communication tree, etc., to achieve the effect of improving system performance and reducing load

Active Publication Date: 2020-11-27
NAT UNIV OF DEFENSE TECH
View PDF4 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, this will cause the system to be in a contradictory state: First, if the two upper limits are not changed, when the node size increases, although it can be guaranteed that the number of threads will not exceed the upper limit at the same time, it will make a large number of The performance of the system will be seriously damaged if the sending request enters the waiting state and cannot be processed in time
Secondly, once these two upper limits are increased, although the sending request can be guaranteed to be processed in time, the load on the control node will also increase accordingly
[0009] 2. When the size of the node increases, the load related to message forwarding on the computing node increases
[0011] To sum up, under the condition of increasing node scale, if a tree structure is used to send messages, the width of the communication tree cannot be greatly increased, otherwise it will bring greater load to both the control node and the computing node, reducing the system performance

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Mass computing node resource monitoring and management method for high-performance computer
  • Mass computing node resource monitoring and management method for high-performance computer
  • Mass computing node resource monitoring and management method for high-performance computer

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0055] First of all, this embodiment proposes an implementation method in which the entire process of processing and sending requests is independently completed by the control node. In the application scenario of large-scale nodes, it will bring various loads to the control node. The entire processing process is as follows: figure 1 Shown: as figure 1 As shown, the specific workflow is as follows:

[0056] The first step is that the control thread (agent_init thread) continuously removes a message sending request from the chain under the premise that the total number of relevant threads does not exceed the thread upper limit, and generates a worker thread (agent thread) to process the request;

[0057] The second step is data preparation for worker threads. Data preparation is mainly to determine whether the message is sent through a star structure or a tree structure, and if it is sent through a tree structure, the target node is also grouped;

[0058] The third step is for...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a mass computing node resource monitoring management method for a high-performance computer, which comprises the following steps that: enabling a control node to send a messagesending request through an intermediate node: enabling the control node to take out the message sending request and generate a working thread for processing the message sending request; selecting a normal intermediate node through the working thread; forwarding the message sending request to the selected intermediate node through the working thread, then waiting for a message returned by the intermediate node, and skipping to execute the next step after receiving the message returned by the intermediate node; and enabling the working thread to process the returned message, updating the statesof the intermediate node and the computing node, and ending the working thread. According to the invention, a layer of intermediate node is added between the control node and the massive computing nodes to share the load of the control node in the process of monitoring and managing massive computing node resources, and the related load of the computing nodes in the process is reduced at the sametime.

Description

technical field [0001] The invention relates to a high-performance computer massive computing node resource management technology, in particular to a high-performance computer-oriented massive computing node resource monitoring and management method. Background technique [0002] Currently, a management mode in which a single control node controls a large number of computing nodes is adopted for massive computing node resources in high-performance computers. During the operation of the system, the control node needs to monitor and record the real-time status of each computing node for task assignment and other work. The main way to realize this function is that the control node continuously generates a request to send messages to the computing node (message sending request), obtains the current status of the computing node according to the return message of the computing node and modifies the data structure used to manage the computing node on the control node. The common f...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F9/50G06F9/54
CPCG06F9/505G06F9/542G06F9/546G06F2209/508
Inventor 戴屹钦卢凯董勇王睿伯张伟张文喆邬会军李佳鑫谢旻周恩强迟万庆陈娟
Owner NAT UNIV OF DEFENSE TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products