Hybrid neural network fault prediction method and system for high-performance computer

A hybrid neural network and fault prediction technology, applied in biological neural network models, computer parts, computing, etc.

Active Publication Date: 2021-07-06
XI AN JIAOTONG UNIV
View PDF9 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The fault prediction method generally predicts the state of a high-performance computer for a period of time by analyzing the current state of the high-performance computer and the state of the past period of time, especially whether a fault will occur in the next period of time, so

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Hybrid neural network fault prediction method and system for high-performance computer
  • Hybrid neural network fault prediction method and system for high-performance computer
  • Hybrid neural network fault prediction method and system for high-performance computer

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0137] Using the log data set of the high-performance computer BlueGene / L, this data set is the log data collected by the high-performance computer from June 3, 2005 to January 4, 2006 when the system ran for 215 days, with a total of 4,747,963 entries, of which The total number of fault log data is 348,460, and the fault data accounted for about 7.3%. The random forest algorithm was used to construct the fault prediction model, and the random forest algorithm was used to calculate the importance of features, and further feature selection was carried out. After that, the log data was analyzed using the long-term short-term memory network. Fault prediction, and then use the combination of active fault tolerance and passive fault tolerance to deal with upcoming faults, so as to reduce the loss caused by faults to the system.

[0138] S1. Collect log data

[0139] This data set is the log data collected by BlueGene / L from June 3, 2005 to January 4, 2006 when the system ran for 21...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a hybrid neural network fault prediction method and system for a high-performance computer, and the method comprises the steps: collecting the log data of the high-performance computer, wherein the log data comprises a log event id, a timestamp when a corresponding log event occurs, and a log event level; performing data cleaning and feature selection on the collected log data to obtain initial feature data; constructing a fault prediction model by using a random forest algorithm, inputting the obtained initial feature data into the fault prediction model, calculating feature importance by using the random forest algorithm, and performing feature selection to obtain feature sample data; and inputting the obtained feature sample data into an LSTM network model, and predicting whether a fault event exists in the feature sample by using the LSTM network model. According to the method, the log data features are scored and selected through the random forest, and the dimensionality is reduced, so that the training complexity can be reduced, and the training degree is accelerated.

Description

technical field [0001] The invention belongs to the technical field of reliability and availability of storage systems, and in particular relates to a hybrid neural network fault prediction method and system for high-performance computers. Background technique [0002] High Performance Computing (High Performance Computing, HPC) is a computing method that uses parallel processing to run applications. This computing method is efficient, fast and reliable. High-performance computing systems allow the execution of computationally intensive applications on large numbers of high-end processors interconnected by rapid networking. At the same time, high-performance computing systems are now widely used in large fields, such as climate simulation, molecular dynamics, fluid dynamics, medical imaging, and so on. For this reason, various countries are vigorously developing and researching high-performance computing. [0003] With the vigorous development of high-performance computers...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F11/34G06K9/62G06N3/04
CPCG06F11/3476G06F11/3447G06F11/3452G06N3/048G06N3/044G06F18/2113G06F18/24323G06F18/10G06F18/214Y02D10/00
Inventor 伍卫国杨晓曦杨傲康益菲王雄杨诗园
Owner XI AN JIAOTONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products