Fault prediction method for fault log of high-performance computing system

A high-performance computing and fault prediction technology, applied in the field of data processing, can solve problems such as non-reflection, abnormality is not easy to track, abnormal hardware machine inspection, etc., to improve reliability, improve the accuracy of fault analysis, and enhance the efficiency of operation and maintenance Effect

Pending Publication Date: 2021-02-02
广州科泽云天智能科技有限公司
View PDF3 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

While some failures are detectable and obvious, such as kernel panics, most anomalies are not easy to track down
Which component will fail, and how it will affect the system, is unclear
Abnormal symptoms observed in the system may or may not reflect the exact root cause, for example, a kernel panic may be caused by a Luster file system error or a hardware machine check exception

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Fault prediction method for fault log of high-performance computing system
  • Fault prediction method for fault log of high-performance computing system
  • Fault prediction method for fault log of high-performance computing system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0048] The present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0049] refer to Figure 1 to Figure 3 As shown, the present invention provides a kind of fault prediction method facing high-performance computing system fault log, comprising the following steps:

[0050] Step S1, obtaining the fault log data of the high-performance computing system, and analyzing and obtaining the fault time series according to the fault log data, wherein the fault time series is suitable for the LSTM model;

[0051] Step S2, using the K-means algorithm to perform clustering processing on the fault types contained in the above fault log data, wherein the fault types include software faults, hardware faults, human faults and unexplained faults;

[0052] Step S3, building an FD-LSTM model based on the above fault time series;

[0053] Step S4, based on the above-mentioned FD-LSTM model, respectively predict the location of ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a fault prediction method for a fault log of a high-performance computing system, and the method comprises the following steps: obtaining the fault log data of the high-performance computing system, and obtaining a fault time sequence through the analysis of the fault log data, and the fault time sequence is suitable for an LSTM model; adopting a K-means algorithm to carryout clustering processing on the fault types contained in the fault log data; building an FD-LSTM model based on the fault time sequence; and based on the FD-LSTM model, predicting a fault occurrencenode position and fault advance time of a clustering result of each fault type, and carrying out statistical analysis on a prediction result according to a system architecture. According to the invention, through fault classification prediction, the fault analysis accuracy of a high-performance computing system can be effectively improved, the operation and maintenance efficiency is enhanced, andthe reliability of the system is effectively improved.

Description

technical field [0001] The invention relates to the technical field of data processing, in particular to a fault prediction method for high-performance computing system fault logs. Background technique [0002] In order to pursue higher simulation accuracy and obtain more calculation details, scientists increasingly rely on high-performance computers to process unprecedentedly large data sets and complex simulations. High-performance computers have developed rapidly from the initial single-chip system to the cluster system (Cluster) with tens of thousands of processors; and until now, the main means of improving computer performance is still to increase the number of processors. Lead to the rapid expansion of the scale of high-performance computers. At the same time, there are higher requirements on the ability of the system, including software and hardware, to deal with sudden error events, that is, the fault tolerance capability. In particular, the increase in processing...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/18G06F16/28G06F16/2458G06K9/62G06N3/04G06N3/08G06F17/18
CPCG06F16/1815G06F16/285G06F16/2474G06F16/2462G06N3/08G06F17/18G06N3/044G06N3/045G06F18/23213
Inventor 刘锋侯晓东朱肖雄
Owner 广州科泽云天智能科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products