Failure alarm method and system for Lustre parallel file system

An error alarm and file system technology, applied in the computer field, can solve the problems of Lustre file system operation and maintenance not playing a big role, long-term analysis and positioning, inability to analyze and alarm, etc., to save detection time and prevent external network. Attack, easy-to-read effect

Active Publication Date: 2013-01-30
DAWNING INFORMATION IND BEIJING +1
View PDF4 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] LMT can provide some statistics of file system IO traffic, usage rate and other information, but it cannot analyze and alarm the system operating environment, Luster logs and other information, which does not play a big role in the operation and maintenance of Luster file system
When the Luster file system fails, it still takes a long time to analyze and locate the problem. If the administrator is not on site, it is difficult to find and solve the fault in time

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Failure alarm method and system for Lustre parallel file system
  • Failure alarm method and system for Lustre parallel file system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0043] The specific implementation manners of the present invention will be further described in detail below in conjunction with the accompanying drawings.

[0044] The Luster proposed in this embodiment has several factors that affect the stable and efficient operation of Luster: 1) the cluster system environment, such as network communication quality, time synchronization, etc., and 2) Luster's own bugs. 3) Other problems, such as exceeding the scope of use, etc.

[0045] The system operating environment and Luster operating status can be obtained through some testing tools and log information scanning and analysis. Therefore, scanning the system operating environment and log scanning, analyzing and alerting, and performing preliminary processing is to ensure the security and stability of the large-scale Luster parallel file system. feasible way to operate.

[0046] The main idea of ​​this embodiment is to periodically scan the system network quality, Luster log informatio...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a failure alarm method and system for a Lustre parallel file system. The alarm method comprises the steps that (1) a monitor module periodically scans an OSS (Open Source Software) log; (2) the log information is analyzed to judge whether failure information is included, and then an alarm report for the failure information is created; (3) whether failures in the alarm report are serious is judged, and a separate report for serious failures is created for further analysis by a program; and finally the serious failure information and common alarm information are collected to be written in a txt document which is transmitted to an administrator through a mail hub. The failure alarm system of the invention comprises a LAToolkit service terminal, a storage client cluster, a main server cluster and a LAToolkit client. Intelligent failure analysis is achieved through the control of the system disclosed by the invention, a concise report is created, the approximate situations of the failures can be known remotely through a mobile phone, the detection time is saved, the cost of the method and the system is low, and just using the original equipment instead of adding new equipment can achieve the purpose.

Description

technical field [0001] The invention relates to the field of computers, in particular to a Luster parallel file system error alarm method and system thereof. Background technique [0002] The general environment of large-scale supercomputing centers is relatively complex. When the Luster parallel file system fails, there are many related factors. It usually takes a lot of time to locate the problem by manually searching logs and other information, and the fault cannot be resolved in time. . Currently, Luster's monitoring software mainly uses LMT. LMT can better present the historical usage status of Luster through some statistical information interfaces of Luster, such as the current read and write rate, space usage, and so on. [0003] LMT can provide some statistics on file system IO traffic, usage rate and other information, but it cannot analyze and alarm the system operating environment, Luster logs and other information, and it does not play a big role in the operati...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F11/32G06F11/34G06F17/30H04L29/06
Inventor 刘冠川王勇秦东明何牧君杨亮张新风陈飞刘超吕永安
Owner DAWNING INFORMATION IND BEIJING
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products