Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

GPU cluster monitoring system and method for issuing monitoring alarm

A technology of GPU cluster and monitoring system, applied in the field of information, can solve the problems of single development, real-time monitoring interface, and only monitoring CPU.

Inactive Publication Date: 2014-05-07
CHINA PETROLEUM & CHEM CORP +1
View PDF4 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] GPUs are now more and more widely used in the field of geophysics, and large-scale GPU clusters have also emerged. However, no corresponding real-time monitoring system has been developed for large-scale GPU cluster equipment. It can only monitor cpu, memory, storage, etc. State of traditional computer hardware
Moreover, the current real-time monitoring interface is single, which can only reflect the health status of nodes and the utilization rate of CPU and GPU

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • GPU cluster monitoring system and method for issuing monitoring alarm
  • GPU cluster monitoring system and method for issuing monitoring alarm
  • GPU cluster monitoring system and method for issuing monitoring alarm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0018] Below in conjunction with accompanying drawing, the present invention is described in further detail:

[0019] As a GPU cluster for high-performance computing, the real-time monitoring of GPU has always been the most concerned by operation and maintenance personnel. Based on the original structure of Ganglia, the present invention writes a system suitable for GPU monitoring, and designs monitoring information to realize real-time monitoring of GPU. Generally, the so-called GPU is a computing node that includes a GPU card. In an ordinary monitoring system, only conventional information such as cpu and memory can be monitored in real time, but the GPU card cannot be monitored in real time. The present invention develops a system for this purpose To monitor the utilization rate of the GPU card in real time.

[0020] Such as figure 1 As shown, the GPU cluster monitoring system of the present invention is applied in a GPU cluster, and collects and transmits data by deployi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a GPU cluster monitoring system and a method for issuing monitoring alarms and belongs to field of information technology. The GPU cluster monitoring system comprises data acquiring modules, an analyzing module, and a showing module. Each calculating node in a GPU cluster is provided with a data acquiring module which acquires data information of the calculating node, wherein the data information is the utilization rate of a GPU card. The analyzing module arranged on an agent node and collects the data information acquired by the data acquiring modules in the agent node, performs statistical analysis on the data information, and generates a simplified data sheet. The showing module arranged on an information issuing server receives the simplified data sheet generated by the analyzing module, establishes a web platform, and shows the simplified data sheet in a graphic form and visualized manner such that an operation and maintenance worker may monitor the GPU cluster real time.

Description

technical field [0001] The invention belongs to the field of information technology, and in particular relates to a GPU cluster monitoring system and a monitoring alarm issuing method. Background technique [0002] GPUs are now more and more widely used in the field of geophysics, and large-scale GPU clusters have also emerged. However, no corresponding real-time monitoring system has been developed for large-scale GPU cluster equipment. It can only monitor cpu, memory, storage, etc. The traditional state of computer hardware. Moreover, the current real-time monitoring interface is single, which can only reflect the health status of nodes and the utilization rate of CPU and GPU. Contents of the invention [0003] The purpose of the present invention is to solve the problems in the above-mentioned prior art, provide a GPU cluster monitoring system and a monitoring and alarm issuing method, and provide a real-time monitoring system for the special requirements of GPU equipm...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): H04L29/08H04L12/24H04L12/26
Inventor 葛鑫王胜春李进
Owner CHINA PETROLEUM & CHEM CORP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products