Method for positioning faulted memory in linux system

A technology of memory location and positioning method, applied in the detection of faulty computer hardware, hardware monitoring, etc., can solve problems such as difficulty in reproduction, long testing time, and inability to exclude precise positioning, and achieve a simple implementation and improved efficiency. Effect

Inactive Publication Date: 2013-07-10
LANGCHAO ELECTRONIC INFORMATION IND CO LTD
View PDF2 Cites 41 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] 1) Using the memory fault location and recording function of the motherboard integrated bmc, when a correctable ecc error or an uncorrectable ecc error occurs in the memory, the motherboard bmc can record the memory error information and record the memory slot where the fault occurred, so as to quickly locate the faulty memory location , but the use of this method has certain limitations. First, it must be ensured that the server has a bmc management chip. However, bmc management is a technology that has only been used in recent years. Early general models do not have a bmc management chip; moreover, even if The machine comes with a bmc chip, which does not necessarily have the function of memory fault location and needs to be developed independently, so it is still not possible to realize the location and detection of memory faults; the bmc memory fault location function is limited by the existence of the management chip and its own functions, and cannot become a general solution;
[0004] 2) Memory stress test: After the initial judgment that it is a memory fault, it is necessary to locate which memory has a fault from more than a dozen memory, use the memory stress test tool to test the system platform and memory in batches, gradually narrow the scope, and finally locate There are certain limitations in the use of this method for any memory failure: First, for the stress test of large-capacity memory, the failure recurrence time is not easy to grasp, and it is very likely that the stress test may not necessarily reproduce the failure in one day, especially for customers who have a failure about once a week It is more difficult to reproduce the problem on the machine; second, the overall test time is too long when using the method of batch testing; third, it cannot be ruled out that it is not possible to accurately locate the problem due to poor memory contact or a problem with the memory controller of the CPU itself;

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for positioning faulted memory in linux system
  • Method for positioning faulted memory in linux system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0036] The method of the present invention is described in detail below with reference to the accompanying drawings.

[0037] Fault memory location location process and example description:

[0038] 1) Install the Linux system on the faulty platform, platform and software installation configuration requirements:

[0039] mcelog support platform requirements

[0040] 32 bit x86 Linux: supported by Redhat 6.0 and above, need to use source code, compile and install;

[0041] 64 bit x86_64 Linux: Redhat version 5.0 or later provides rpm packages, which are not installed by default and need to be specified for installation (in the hardware monitoring tab);

[0042] If you want to boot randomly, use the chkconfig command

[0043] Chkconfig --add mcelogd

[0044] Chkconfig --level 5 mcelogd on

[0045] Service mcelogd restart

[0046] Mcelog related files

[0047] / dev / mcelog (device file)

[0048] / var / log / mcelog (log files)

[0049] / etc / mcelog / mcelog.conf (configuration f...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a method for positioning a faulted memory in a linux system. The method comprises that error types and fault positions of the faulted memory are judged quickly and problems such as server halting and blue screens caused by memory faults are solved quickly by usage of mcelog records produced by the system according to the actual physical slot comparison relationship without regard to judgments and records of memory faults provided by a mainboard baseboard management controller (bmc). Compared with existing memory fault judgment methods, the method has the advantages that the method is independent of a memory fault tracking and positioning function of the mainboard bmc; on-site repetition of pressure tests to wait for faults and exchange of memory tests to position the faulted memory are not required, and the fault appearance frequency, fault reasons and fault positions can be determined quickly according to generated mcelog logs; and the implementation method is simple, and the efficiency for judgment and solving of problems can be improved.

Description

technical field [0001] The invention relates to the field of computer applications, in particular to a method for locating fault memory locations under a linux system. Background technique [0002] For the judgment of the faulty memory location, there are two general methods before: [0003] 1) Using the memory fault location and recording function of the motherboard integrated bmc, when a correctable ecc error or an uncorrectable ecc error occurs in the memory, the motherboard bmc can record the memory error information and record the memory slot where the fault occurred, so as to quickly locate the faulty memory location , but the use of this method has certain limitations. First, it must be ensured that the server has a bmc management chip. However, bmc management is a technology that has only been used in recent years. Early general models do not have a bmc management chip; moreover, even if The machine comes with a bmc chip, which does not necessarily have the function...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F11/22G06F11/34
Inventor 李斌任华进
Owner LANGCHAO ELECTRONIC INFORMATION IND CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products