Error fixing method, device, apparatus and storage medium
By configuring AMT target option values and calling callback functions in the system, the problems of needing to restart and serial port dependency for AMT testing were solved, and stable memory error repair was achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- INSPUR SUZHOU INTELLIGENT TECH CO LTD
- Filing Date
- 2023-08-29
- Publication Date
- 2026-06-23
Smart Images

Figure CN117472657B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of computer technology, and in particular to an error repair method, apparatus, device, and storage medium. Background Technology
[0002] AMT (Active Management Technology) is a hardware-based remote management technology used to monitor, maintain, update, upgrade, and fix bugs in the hardware and firmware of a personal computer for remote out-of-band management.
[0003] In existing technologies, enabling AMT requires restarting the system and then performing AMT testing during the POST (Power On Self Test) process. A serial cable is also used to connect the host and the server under test, using serial port software on the host to obtain test logs and output them via the serial port. Finally, the logs are analyzed to identify errors and then different handling methods are implemented for different errors. Because this method requires restarting the system, which means pausing and resetting all services for the server, it cannot be effectively applied in actual operation. Furthermore, obtaining logs via the serial port is too dependent on the serial port, making log data acquisition unstable. Summary of the Invention
[0004] The purpose of this invention is to provide an error repair method, apparatus, device, and storage medium to solve the problem that existing error repair methods require system restarts and log acquisition via serial port, resulting in excessive reliance on serial port and unstable log data acquisition. The specific technical solution is as follows:
[0005] In a first aspect of the present invention, an error repair method is provided, characterized in that the method comprises:
[0006] Collect memory errors present in the system;
[0007] Configure the value of the target option in Active Management Technology (AMT) when a memory error is detected that meets the preset conditions;
[0008] The AMT test is executed by calling the target callback function based on the value of the target option.
[0009] Obtain the test log corresponding to the AMT test, and the test log is stored in a pre-allocated memory space;
[0010] The error repair method corresponding to the memory error is determined based on the test log.
[0011] Optionally, before invoking the target callback function to perform the AMT test based on the value of the target option, the method further includes:
[0012] Identify multiple objective functions to be modified;
[0013] The objective function is modified into a runtime-time objective callback function.
[0014] Optionally, the memory error includes: uncorrectable errors and correctable errors;
[0015] The step of configuring the value of the target option in Active Management Technology (AMT) when a memory error is detected and a preset condition is met includes:
[0016] If an uncorrectable error is detected in the memory error, configure the value of the target option in Active Management Technology (AMT).
[0017] or
[0018] Filter out correctable errors from the memory errors;
[0019] Determine the target number corresponding to the correctable errors;
[0020] If the number of targets is detected to be less than a first threshold, the correctable error is repaired through the error correction code (ECC) mechanism.
[0021] If the number of targets is detected to be greater than a first threshold, the values of the target options in the Active Management Technology (AMT) are configured, wherein the target options in the AMT include: Memory Test (MemTest), Memory Test Loops (MemTest loops), and Advanced Memory Test Options (Adv MemTest Options) related to vendor time settings.
[0022] Optionally, the step of repairing the correctable error through an error correction code (ECC) mechanism when the number of detected targets is less than a first threshold includes:
[0023] If the number of targets is less than a first threshold, the memory error is determined to be a correctable error that can be corrected by ECC and does not affect performance.
[0024] The correctable error is repaired using the Error Correction Code (ECC) mechanism.
[0025] The step of configuring the target option value in Active Management Technology (AMT) when the detected number of targets exceeds a first threshold includes:
[0026] If the number of targets detected is greater than a first threshold, it is determined that the memory error is a target error that can be corrected by ECC and the system is experiencing sluggishness.
[0027] The target error is configured by setting the value of the target option in Active Management Technology (AMT).
[0028] Optionally, configuring the value of the target option in Active Management Technology (AMT) when a memory error is detected and a preset condition is met includes:
[0029] If a memory error is detected that meets the preset conditions, the memory test MemTest will be configured to be enabled.
[0030] The memory test loops (MemTest loops) and the advanced memory test options (AdvMemTest Options) are assigned target values according to preset rules.
[0031] Optionally, determining the error repair method corresponding to the memory error based on the test log includes:
[0032] The AMT test results are obtained by parsing the test logs.
[0033] If the AMT test result is determined to be an advanced memory test error and the self-test package repair PPR is successful, then the memory error is determined to be repaired using the error repair method of the PPR.
[0034] If the AMT test result is determined to be an Advanced Memory Test Completion Error that does not support PPR or a PPR failure, then the memory error is determined to be corrected by replacing the Dual In-line Memory (DIMM) module.
[0035] Optionally, after obtaining the test logs corresponding to the AMT test, the method further includes:
[0036] The test logs are converted into strings according to a preset format;
[0037] The string is sent to the target interface for display.
[0038] In a second aspect of the invention, an error repair apparatus is also provided, characterized in that it comprises:
[0039] The first collection module is used to collect memory errors present in the system.
[0040] The first configuration module is used to configure the value of the target option in Active Management Technology (AMT) when a memory error is detected and the preset conditions are met.
[0041] The first test module is used to execute the AMT test by calling the target callback function based on the value of the target option;
[0042] The first acquisition module is used to acquire the test log corresponding to the AMT test, and the test log is stored in a pre-allocated memory space;
[0043] The first determining module is used to determine the error repair method corresponding to the memory error based on the test log.
[0044] In a third aspect of the present invention, a communication device is also provided, comprising: a transceiver, a memory, a processor, and a program stored in the memory and executable on the processor;
[0045] The processor is used to read the program in the memory and execute any of the error repair methods described above.
[0046] In a fourth aspect of the invention, a computer-readable storage medium is also provided, wherein instructions are stored therein, which, when executed on a computer, cause the computer to perform any of the error-correction methods described above.
[0047] The error repair method provided in this invention collects memory errors existing in the system. When a memory error is detected and meets preset conditions, the value of a target option in Active Management Technology (AMT) is configured. The target callback function is called based on the value of the target option to execute the AMT test, obtain the test log corresponding to the AMT test, and store the test log in a pre-allocated memory space. The error repair method corresponding to the memory error is determined based on the test log. This invention can selectively execute AMT tests and repair errors by detecting whether the memory error meets preset conditions. By calling the target callback function, AMT tests can be executed under the system, avoiding the problem of needing to restart the AMT test. By storing the test log in a pre-allocated memory space, there is no need to obtain the log through the serial port, reducing the dependence on the serial port and enhancing the stability of log data acquisition. Attached Figure Description
[0048] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the accompanying drawings used in the description of the embodiments or the prior art will be briefly introduced below.
[0049] Figure 1 This is one of the flowcharts of the error repair method provided in the embodiments of the present invention;
[0050] Figure 2 This is the second step flowchart of the error repair method provided in the embodiment of the present invention;
[0051] Figure 3This is a flowchart of step 105 of the error repair method provided in this embodiment of the invention;
[0052] Figure 4 This is the third step in the flowchart of the error repair method provided in the embodiments of the present invention;
[0053] Figure 5 This is a schematic diagram of the structure of an error repair device provided in an embodiment of the present invention;
[0054] Figure 6 This is a schematic diagram of the structure of a communication device provided in an embodiment of the present invention. Detailed Implementation
[0055] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the various embodiments of the present invention will be described in detail below with reference to the accompanying drawings. However, those skilled in the art will understand that many technical details are presented in the various embodiments of the present invention to facilitate a better understanding of this application. However, the technical solutions claimed in this application can be implemented even without these technical details and various changes and modifications based on the following embodiments. The division of the various embodiments below is for ease of description and should not constitute any limitation on the specific implementation of the present invention. The various embodiments can be combined with and referenced by each other without contradiction.
[0056] Reference Figure 1 The diagram illustrates one of the steps of an error repair method provided in an embodiment of the present invention, the method including:
[0057] Step 101: Collect information on memory errors present in the system.
[0058] Memory errors present in the system of this invention include, but are not limited to, memory leaks, mismatched allocation / deallocation, out-of-bounds memory access, accessing null pointers, wild pointers, referencing uninitialized variables, attempting to modify constants, stack overflows, and duplicate symbol names. These memory errors include: UCE (Memory Uncorrectable Error) and CE (Memory Correctable Error). A correctable error is one that can be corrected. For example, when a memory check encounters an error, and the detected error is 1 bit, it may be due to an ECC error. In this case, the CPU can correct it without affecting any system processes. An uncorrectable error is one that cannot be corrected. For example, when a memory check encounters an error, and the detected error is multi-bit, the system hardware cannot directly handle and recover from it, which may cause the server to crash.
[0059] Step 102: If a memory error is detected and the preset conditions are met, configure the value of the target option in Active Management Technology (AMT).
[0060] In this embodiment of the invention, since there are two types of memory errors, the preset conditions also include two types. One type is for uncorrectable errors, because the system cannot automatically repair uncorrectable errors and will trigger a system management interrupt. Therefore, once this type of memory error occurs, it is considered that the preset conditions are met. The other type is for correctable errors, because correctable errors can be repaired by the ECC mechanism in the system. However, this repair is not unlimited. Therefore, when the number of such errors accumulates to a certain number, it is also considered that the preset conditions are met, and a system management interrupt is also triggered. The specific implementation steps include:
[0061] If an uncorrectable error is detected in a memory error, configure the value of the target option in Active Management Technology (AMT).
[0062] or
[0063] Filter out correctable errors from memory errors;
[0064] Determine the target number corresponding to correctable errors;
[0065] If the number of targets detected is less than the first threshold, correctable errors are repaired through the error correction code (ECC) mechanism.
[0066] If the number of targets detected exceeds a first threshold, configure the values of the target options in Active Management Technology (AMT). The target options in AMT include: Memory Test (MemTest), Memory Test Loops (MemTest loops), and Advanced Memory Test Options (Adv MemTest est Options) related to vendor time settings.
[0067] For example, if the first threshold is set to 500, then if 450 correctable errors are detected, since 450 < 500, the correctable errors can be repaired through the ECC mechanism. If 550 correctable errors are detected, since 550 > 500, the Active Management Technology (AMT) test technology needs to be started. Therefore, an appropriate value should be configured for the target option in the Active Management Technology (AMT).
[0068] By identifying memory errors, AMT tests can be selectively performed to address and repair memory errors generated in the system, saving time.
[0069] Furthermore, since correctable errors are determined to meet preset conditions when they exceed a first threshold, AMT (Advanced Mitigation Technology) can be used. When correctable errors are less than the first threshold, another error repair method is used. Therefore, the current error situation is further determined based on the number of correctable errors. The specific implementation steps include:
[0070] If the number of detected targets is less than a first threshold, the memory error is determined to be a correctable error that is ECCable and does not affect performance.
[0071] Correctable errors are fixed using the Error Correction Code (ECC) mechanism.
[0072] If the number of detected targets exceeds the first threshold, it is determined that the memory error is a correctable ECC target error and the system is experiencing a slow-moving error.
[0073] The target option value is configured in Active Management Technology (AMT) by targeting error.
[0074] In addition, because existing technologies use self-test packaging to repair PPR when fixing errors, the steps include: enabling the memory test MemTest in the BIOS (Basic Input Output System), setting the memory test loops (MemTest loops) to the desired value (8 or 9, meaning the AMT test is executed 8 or 9 times), and then setting the advanced memory test (AdvMemTest Optimizer) according to the memory vendor's requirements. The `ons` value specifies the duration of the test execution. For example, setting it to 70000 means the AMT test will last 70000ms. It also disables the fast startup option to prevent some memory errors from going undetected and disables the watchdog timer FRB-2 to prevent system restarts during testing due to FRB-2 timing (FRB-2 is a fault recovery mechanism used to recover from watchdog timeouts during power-on self-test; if a timeout occurs, the system will restart). In this embodiment, because the function is modified into a runtime-based target callback function, it does not need to be tested during the POST process. However, the values of the relevant options (i.e., target options) required for AMT testing still need to be set. Therefore, when a memory error is detected and the preset conditions are met, the values of the target options in Active Management Technology (AMT) are configured. The specific configuration steps include:
[0075] If a memory error is detected and the preset conditions are met, the memory test MemTest will be enabled.
[0076] The memory test loops (MemTest loops) and advanced memory test options (Adv MemTestOptions) are assigned target values according to preset rules.
[0077] It should be noted that in the target options configured during AMT testing, MemTest can run without an operating system or hard drive, and it can detect errors in memory, such as crashes, freezes, or blue screens, and report them automatically.
[0078] Step 103: Execute the AMT test by calling the target callback function based on the value of the target option.
[0079] In this embodiment of the invention, after setting the values of the target options, these values are used as input to the callback function to call the target callback function to execute the AMT test. For example, the set memory test MemTest can determine whether AMT needs to be started. The target callback function will only be called when the set value of memory test MemTest indicates that AMT needs to be started. The set value of memory test loop MemTest loops is used to indicate the number of times the target callback function will loop. The set value of advanced memory test Adv MemTest Options is used to determine the execution duration of the target callback function for this AMT.
[0080] Step 104: Obtain the test log corresponding to the AMT test. The test log is stored in the pre-allocated memory space.
[0081] The target callback function set in this embodiment of the invention can also enable logging by default, so there is no need to output the log through the serial port. The log can be directly input into the memory space. Therefore, it is necessary to pre-allocate memory space for storing the log. The size of this space can be calculated based on the comprehensive measurement of historical data and controlled within a reasonable range to avoid it being too small, which would cause some log data to be unable to be stored, and also to avoid it being too large, which would lead to a waste of memory space resources.
[0082] It should be noted that after obtaining the test logs, in order for staff to receive the results promptly, the test logs also need to be converted into strings for display. Specific implementation methods include:
[0083] Convert the test logs into strings according to a preset format;
[0084] Send the string to the target interface for display.
[0085] By converting test logs into strings according to a preset format, data unification can be achieved, making it easier for staff to parse the test logs. The strings are then sent to the target interface, allowing staff to obtain information promptly and address any issues in a timely manner.
[0086] Step 105: Determine the error repair method corresponding to the memory error based on the test log.
[0087] In this embodiment of the invention, the test log is obtained by executing AMT. AMT uses self-test package repair (PPR) to repair errors. There are two repair results: one is that the repair is completed by the above self-test package repair (PPR), then no further operation is required; the other is that the above self-test package repair (PPR) cannot repair, then further operation is required, that is, it may be necessary to replace the memory module, i.e., dual in-line memory module (DIMM).
[0088] The error repair method provided in this invention collects memory errors existing in the system. When a memory error is detected and meets preset conditions, the value of a target option in Active Management Technology (AMT) is configured. The target callback function is called based on the value of the target option to execute the AMT test, obtain the test log corresponding to the AMT test, and store the test log in a pre-allocated memory space. The error repair method corresponding to the memory error is determined based on the test log. This invention can selectively execute AMT tests and repair errors by detecting whether the memory error meets preset conditions. By calling the target callback function, AMT tests can be executed under the system, avoiding the problem of needing to restart the AMT test. By storing the test log in a pre-allocated memory space, there is no need to obtain the log through the serial port, reducing the dependence on the serial port and enhancing the stability of log data acquisition.
[0089] Reference Figure 2 The second flowchart of the error repair method provided in this embodiment of the invention is shown, specifically including:
[0090] Step 201: Collect information on memory errors present in the system.
[0091] Step 202: If a memory error is detected and the preset conditions are met, configure the value of the target option in Active Management Technology (AMT).
[0092] The above steps 201-202 refer to the content of steps 101-102 discussed in the preceding paragraph, and will not be repeated here.
[0093] Step 203: Determine the multiple objective functions to be modified.
[0094] The target function in this embodiment of the invention is a function related to AMT testing, including: AMT-related functions, PPR (Package Repair Self-Test) related functions, and all functions provided by suppliers (manufacturers) that need to be executed, such as XMATS8, XMATS16, XMATS32, XMATS64, WCMATS8, WCMCH8, GNDB64, MARCHCM64, LTEST_SCRAM, LINIT_SCRAM, RANGE_TEST_SCRAM, TWR, DATA_RET, MATS8_TC1, MATS8_TC2, MATS8_TC3, SKHYNIX, SAMSUNG, MICRON, SCRAM_X2. This is only a partial example. Since different suppliers (manufacturers) provide different functions, there can be other functions. This invention does not make specific limitations here. There are multiple corresponding target functions, which are not fixed and vary depending on the supplier (manufacturer).
[0095] Step 204: Modify the objective function into a runtime objective callback function.
[0096] In this embodiment of the invention, the target function is changed from the PEI driver to a runtime target callback function. Runtime refers to the state of a program during execution (cc or being executed). In some programming languages, reusable programs or instances are packaged or rebuilt into "runtime libraries." These instances can be linked or called by any program during runtime. By setting the target function to a runtime target callback function, this embodiment of the invention enables memory AMT testing and repair to be performed under the system, solving the current problem of requiring a restart for AMT testing.
[0097] Step 205: Execute the AMT test by calling the target callback function based on the value of the target option.
[0098] Step 206: Obtain the test log corresponding to the AMT test. The test log is stored in the pre-allocated memory space.
[0099] Step 207: Determine the error repair method corresponding to the memory error based on the test log.
[0100] The above steps 205-207 refer to the content of steps 103-105 discussed in the preceding paragraph, and will not be repeated here.
[0101] Reference Figure 3 The flowchart illustrates step 105 of the error repair method provided in this embodiment of the invention, which specifically includes:
[0102] Step 301: Obtain the AMT test results by parsing the test logs.
[0103] In this embodiment of the invention, the test log output is a string, which needs to be parsed to determine its actual meaning. Furthermore, the test log includes not only test results but also log information for other operations. Therefore, during parsing, keywords can be searched to obtain the AMT test results. For example, by searching for the keyword "MemTest" in the string output by the test log, the start and end positions of Adv MemTest can be determined to ensure that Adv MemTest has been completed. The content between "[SckId:0]MemTest--Started" and "[SckId:1]MemTest-8253445ms" can be obtained by searching for the keyword "MemTest". The former signifies the start of the AMT test, and the latter signifies the end of the AMT test. 8253445ms indicates the execution duration of the AMT test.
[0104] Step 302: If the AMT test result is determined to be an advanced memory test error and the self-test package repair PPR is successful, determine the error repair method to use for the memory error using PPR.
[0105] In this embodiment of the invention, different processing procedures are executed for different errors. When it is determined that the AMT test result is an advanced memory test error and the PPR (POST Package Repair) repair is successful, it indicates that the advanced memory test error has been repaired by PPR during the AMT execution process, and no further action is required.
[0106] Step 303: If the AMT test result is determined to be an Advanced Memory Test Completion Error that does not support PPR or a PPR failure, determine that the memory error is corrected by replacing the Dual In-line Memory Module (DIMM).
[0107] In this embodiment of the invention, if the AMT test result is determined to be an Advanced Memory Test Complete Error that does not support PPR or a PPR failure, it indicates that the error cannot be repaired by PPR, or that the PPR repair program is faulty and cannot be repaired by PPR. In this case, it is necessary to replace the Dual In-line Memory Module (DIMM).
[0108] This invention, through parsing test logs, determines whether the current error repair method is to use PPR to repair the AMT or to replace the DIMM, thereby achieving a quick and targeted solution to the problem.
[0109] Reference Figure 4 The third step of the error repair method provided in this embodiment of the invention is illustrated, and the method may include:
[0110] The error repair method of this invention includes collecting memory errors present in the system. If the memory error is a correctable error with a target number less than a first threshold, no further operation is required. If the memory error is an uncorrectable error or a correctable error with a target number greater than the first threshold, the value of the target option in Active Management Technology (AMT) is configured, and a target callback function is called to execute the AMT test. Then, the test log corresponding to the AMT test is obtained. This obtained test log is converted into an output string for easy viewing by staff. After obtaining the test log, it is parsed. If an advanced memory test error is determined to exist, and the advanced memory test error is successfully repaired through self-test encapsulation repair (PPR), no further operation is required. If the advanced memory test error fails to be repaired through self-test encapsulation repair (PPR), the dual in-line memory (DIMM) is replaced, and then the error repair ends.
[0111] The error repair method provided in this invention collects memory errors existing in the system. When a memory error is detected and meets preset conditions, the value of a target option in Active Management Technology (AMT) is configured. The target callback function is called based on the value of the target option to execute the AMT test, obtain the test log corresponding to the AMT test, and store the test log in a pre-allocated memory space. The error repair method corresponding to the memory error is determined based on the test log. This invention can selectively execute AMT tests and repair errors by detecting whether the memory error meets preset conditions. By calling the target callback function, AMT tests can be executed under the system, avoiding the problem of needing to restart the AMT test. By storing the test log in a pre-allocated memory space, there is no need to obtain the log through the serial port, reducing the dependence on the serial port and enhancing the stability of log data acquisition.
[0112] Reference Figure 5 The diagram shows a structural schematic of an error repair device provided in an embodiment of the present invention, as shown below. Figure 5 As shown, the device may include:
[0113] The first collection module 401 is used to collect memory errors that exist in the system.
[0114] The first configuration module 402 is used to configure the value of the target option in Active Management Technology (AMT) when a memory error is detected and a preset condition is met.
[0115] The first test module 403 is used to execute AMT tests by calling the target callback function based on the value of the target option.
[0116] The first acquisition module 404 is used to acquire the test logs corresponding to the AMT test. The test logs are stored in a pre-allocated memory space.
[0117] The first determination module 405 is used to determine the error repair method corresponding to the memory error based on the test log.
[0118] Optionally, the error repair device also includes:
[0119] The second determination module is used to determine multiple objective functions to be modified.
[0120] The modification module is used to modify the target function into a runtime target callback function.
[0121] Optionally, memory errors include: uncorrectable errors and correctable errors.
[0122] The first configuration module 402 also includes:
[0123] The first configuration submodule is used to configure the value of the target option in Active Management Technology (AMT) when an uncorrectable error is detected in a memory error.
[0124] or
[0125] The filtering submodule is used to filter out correctable errors from memory errors.
[0126] The first determination submodule is used to determine the number of targets corresponding to correctable errors.
[0127] The repair submodule is used to repair correctable errors using the error correction code (ECC) mechanism when the number of detected targets is less than a first threshold.
[0128] The second configuration submodule is used to configure the values of target options in Active Management Technology (AMT) when the number of detected targets exceeds the first threshold. The target options in AMT include: MemTest, MemTest loops, and Advanced MemTest Options related to vendor time settings.
[0129] Optionally, the repair submodule also includes:
[0130] The first determining unit is used to determine, when the number of detected targets is less than a first threshold, that the memory error is a correctable error that can be corrected by ECC and whose performance is not affected.
[0131] The repair unit is used to repair correctable errors through the Error Correction Code (ECC) mechanism.
[0132] The second configuration submodule also includes:
[0133] The second determining unit is used to determine, when the number of detected targets is greater than a first threshold, that the memory error is a target error that can be corrected by ECC and the system is experiencing a slowdown.
[0134] The configuration unit is used to configure the values of target options in Active Management Technology (AMT) via target error configuration.
[0135] Optionally, the first configuration module 402 further includes:
[0136] The third configuration submodule is used to enable the memory test MemTest when a memory error is detected and the preset conditions are met.
[0137] The assignment submodule is used to assign target values to the memory test loops (MemTest loops) and advanced memory test options (Adv MemTest Options) according to preset rules.
[0138] Optionally, the first determining module 405 further includes:
[0139] The first acquisition submodule is used to obtain AMT test results by parsing test logs.
[0140] The second determination submodule is used to determine the error repair method of PPR for memory errors when the AMT test result is determined to be an advanced memory test error and the self-test package repair PPR is successful.
[0141] The third determination submodule is used to determine the memory error and use the error repair method of replacing the dual in-line memory module (DIMM) when the AMT test result is an Advanced Memory Test Completion Error that does not support PPR or a PPR failure.
[0142] Optionally, the error repair device also includes:
[0143] The conversion module is used to convert test logs into strings according to a preset format.
[0144] The display module is used to send strings to the target interface for display.
[0145] The error repair method provided in this invention collects memory errors existing in the system. When a memory error is detected and meets preset conditions, the value of a target option in Active Management Technology (AMT) is configured. The target callback function is called based on the value of the target option to execute the AMT test, obtain the test log corresponding to the AMT test, and store the test log in a pre-allocated memory space. The error repair method corresponding to the memory error is determined based on the test log. This invention can selectively execute AMT tests and repair errors by detecting whether the memory error meets preset conditions. By calling the target callback function, AMT tests can be executed under the system, avoiding the problem of needing to restart the AMT test. By storing the test log in a pre-allocated memory space, there is no need to obtain the log through the serial port, reducing the dependence on the serial port and enhancing the stability of log data acquisition.
[0146] This invention also provides a communication device, such as... Figure 6 As shown, it includes a processor 501, a communication interface 502, a memory 503, and a communication bus 504, wherein the processor 501, the communication interface 502, and the memory 503 communicate with each other through the communication bus 504.
[0147] Memory 503 is used to store computer programs;
[0148] When processor 501 executes the program stored in memory 503, it performs the following steps:
[0149] Collect memory errors present in the system;
[0150] Configure the value of the target option in Active Management Technology (AMT) when a memory error is detected that meets the preset conditions;
[0151] The AMT test is executed by calling the target callback function based on the value of the target option.
[0152] Obtain the test log corresponding to the AMT test, and the test log is stored in a pre-allocated memory space;
[0153] The error repair method corresponding to the memory error is determined based on the test log.
[0154] The communication bus mentioned above can be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus, etc. This communication bus can be divided into address bus, data bus, control bus, etc. For ease of illustration, only one thick line is used to represent it in the diagram, but this does not mean that there is only one bus or one type of bus.
[0155] The communication interface is used for communication between the aforementioned terminal and other devices.
[0156] The memory may include random access memory (RAM) or non-volatile memory, such as at least one disk storage device. Optionally, the memory may also be at least one storage device located remotely from the aforementioned processor.
[0157] The processors mentioned above can be general-purpose processors, including central processing units (CPUs), network processors (NPs), etc.; they can also be digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components.
[0158] The present invention also provides a readable storage medium, wherein when the instructions in the storage medium are executed by the processor of an electronic device, the electronic device is able to perform the access control method of the foregoing embodiments.
[0159] As the device embodiment is basically similar to the method embodiment, the description is relatively simple, and relevant parts can be found in the description of the method embodiment.
[0160] The algorithms and displays provided herein are not inherently related to any particular computer, virtual device, or other equipment. The structure required to construct such a device is readily apparent from the above description. Furthermore, this invention is not directed to any particular programming language. It should be understood that the contents of the invention described herein can be implemented using various programming languages, and the above description of specific languages is for the purpose of disclosing the best mode of implementation of the invention.
[0161] Numerous specific details are set forth in the specification provided herein. However, it will be understood that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures, and techniques have not been shown in detail so as not to obscure the understanding of this specification.
[0162] Similarly, it should be understood that, in order to simplify the invention and aid in understanding one or more of the various inventive aspects, in the above description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof. However, this method of disclosure should not be construed as reflecting an intention that the claimed invention requires more features than expressly recited in each claim. Rather, as reflected in the following claims, inventive aspects lie in fewer than all features of a single foregoing disclosed embodiment. Therefore, the claims following the detailed description are hereby expressly incorporated into this detailed description, wherein each claim itself is a separate embodiment of the invention.
[0163] Those skilled in the art will understand that modules in the device of the embodiments can be adaptively changed and placed in one or more devices different from that embodiment. Modules, units, or components in the embodiments can be combined into a single module, unit, or component, and further, they can be divided into multiple sub-modules, sub-units, or sub-components. Except where at least some of such features and / or processes or units are mutually exclusive, any combination can be used to combine all features disclosed in this specification (including the accompanying claims, abstract, and drawings) and all processes or units of any method or device so disclosed. Unless expressly stated otherwise, each feature disclosed in this specification (including the accompanying claims, abstract, and drawings) may be replaced by an alternative feature that serves the same, equivalent, or similar purpose.
[0164] The various component embodiments of the present invention can be implemented in hardware, or as software modules running on one or more processors, or a combination thereof. Those skilled in the art will understand that microprocessors or digital signal processors (DSPs) can be used in practice to implement some or all of the functions of some or all of the components in the sorting device according to the present invention. The present invention can also be implemented as a device or apparatus program for performing part or all of the methods described herein. Such a program implementing the present invention can be stored on a computer-readable medium, or can be in the form of one or more signals. Such signals can be downloaded from an Internet website, provided on a carrier signal, or provided in any other form.
[0165] It should be noted that the above embodiments are illustrative of the invention and not restrictive, and that those skilled in the art can devise alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses should not be construed as limiting the claims. The word "comprising" does not exclude the presence of elements or steps not listed in the claims. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention can be implemented by means of hardware comprising several different elements and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by the same item of hardware. The use of the words first, second, and third, etc., does not indicate any order. These words can be interpreted as names.
[0166] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working processes of the devices, apparatuses, and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here.
[0167] The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention. Any modifications, equivalent substitutions, and improvements made within the spirit and principles of the present invention should be included within the protection scope of the present invention.
[0168] The above description is merely a specific embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the technical scope disclosed in the present invention should be included within the scope of protection of the present invention. Therefore, the scope of protection of the present invention should be determined by the scope of the claims.
[0169] It should be noted that the various data-related processes in the embodiments of this application are carried out in compliance with the relevant data protection laws and policies of the country where the location is located, and with the authorization granted by the owner of the corresponding device.
Claims
1. A bug fixing method, characterized in that, The method includes: Collect memory errors present in the system; Configure the value of the target option in Active Management Technology (AMT) when a memory error is detected that meets the preset conditions; The AMT test is executed by calling the target callback function based on the value of the target option. Obtain the test log corresponding to the AMT test, and the test log is stored in a pre-allocated memory space; Determine the error repair method corresponding to the memory error based on the test log; The memory errors include: uncorrectable errors and correctable errors; The step of configuring the value of the target option in Active Management Technology (AMT) when a memory error is detected and the preset conditions are met includes: If an uncorrectable error is detected in the memory error, configure the value of the target option in Active Management Technology (AMT). or Filter out correctable errors from the memory errors; Determine the target number corresponding to the correctable errors; If the number of targets is detected to be less than a first threshold, the correctable error is repaired through the error correction code (ECC) mechanism. If the number of targets is detected to be greater than a first threshold, configure the value of the target options in Active Management Technology (AMT), wherein the target options in AMT include: Memory Test (MemTest), Memory Test Loops (MemTest loops), and Advanced Memory Test (Adv MemTest Options) related to vendor time settings. The step of configuring the value of the target option in Active Management Technology (AMT) when a memory error is detected and the preset conditions are met includes: If a memory error is detected that meets the preset conditions, the memory test MemTest will be configured to be enabled. The memory test loops MemTest loops and the advanced memory test Adv MemTestOptions are assigned target values according to preset rules; The step of determining the error repair method corresponding to the memory error based on the test log includes: The AMT test results are obtained by parsing the test logs. If the AMT test result is determined to be an advanced memory test error and the self-test package repair PPR is successful, then the memory error is determined to be repaired using the error repair method of the PPR. If the AMT test result is determined to be an Advanced Memory Test Completion Error that does not support PPR or a PPR failure, then the memory error is determined to be corrected by replacing the Dual In-line Memory (DIMM) module.
2. The method according to claim 1, characterized in that, Before invoking the target callback function to execute the AMT test based on the value of the target option, the following steps are also included: Identify multiple objective functions to be modified; The objective function is modified into a runtime-time objective callback function.
3. The method according to claim 1, characterized in that, The step of repairing the correctable error through an error correction code (ECC) mechanism when the number of detected targets is less than a first threshold includes: If the number of targets is less than a first threshold, the memory error is determined to be a correctable error that can be corrected by ECC and does not affect performance. The correctable error is repaired using the Error Correction Code (ECC) mechanism. The step of configuring the target option value in Active Management Technology (AMT) when the detected number of targets exceeds a first threshold includes: If the number of targets detected is greater than a first threshold, it is determined that the memory error is a target error that can be corrected by ECC and the system is experiencing sluggishness. The target error is configured by setting the value of the target option in Active Management Technology (AMT).
4. The method according to claim 1, characterized in that, After obtaining the test logs corresponding to the AMT test, the process also includes: The test logs are converted into strings according to a preset format; The string is sent to the target interface for display.
5. An error correction device, characterized in that, include: The first collection module is used to collect memory errors present in the system. The first configuration module is used to configure the value of the target option in Active Management Technology (AMT) when a memory error is detected and the preset conditions are met. The first test module is used to execute the AMT test by calling the target callback function based on the value of the target option; The first acquisition module is used to acquire the test log corresponding to the AMT test, and the test log is stored in a pre-allocated memory space; The first determining module is used to determine the error repair method corresponding to the memory error based on the test log; Memory errors include: uncorrectable errors and correctable errors; The first configuration module also includes: The first configuration submodule is used to configure the value of the target option in Active Management Technology (AMT) when an uncorrectable error is detected in a memory error. or The filtering submodule is used to filter out correctable errors from memory errors; The first determination submodule is used to determine the number of targets corresponding to correctable errors; The repair submodule is used to repair correctable errors through the error correction code (ECC) mechanism when the number of detected targets is less than a first threshold. The second configuration submodule is used to configure the values of target options in Active Management Technology (AMT) when the number of targets detected exceeds the first threshold. The target options in AMT include: memory test MemTest, memory test loops MemTestloops, and advanced memory test options related to vendor time settings. The first configuration module also includes: The third configuration submodule is used to enable the memory test MemTest when a memory error is detected and the preset conditions are met. The assignment submodule is used to assign target values to the memory test loops MemTest loops and advanced memory test options according to preset rules. The first determining module further includes: The first acquisition submodule is used to obtain AMT test results by parsing test logs; The second determination submodule is used to determine the error repair method of PPR for memory errors when the AMT test result is determined to be an advanced memory test error and the self-test package repair PPR is successful. The third determination submodule is used to determine the memory error and use the error repair method of replacing the dual in-line memory module (DIMM) when the AMT test result is an Advanced Memory Test Completion Error that does not support PPR or a PPR failure.
6. A communication device, characterized in that, include: A transceiver, a memory, a processor, and a program stored in the memory and executable on the processor; The processor is configured to read a program from memory to implement the steps in the error repair method as described in any one of claims 1-4.
7. A readable storage medium for storing a program, characterized in that, When the program is executed by the processor, it implements the steps in the error repair method as described in any one of claims 1-4.