A server memory stress testing method and system
By using Gray code sorting to reduce the number of level flips in memory testing, the problem of high energy consumption and long testing time in server memory aging tests is solved, achieving energy saving, energy reduction and improved coverage.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- INSPUR SUZHOU INTELLIGENT TECH CO LTD
- Filing Date
- 2022-10-09
- Publication Date
- 2026-06-19
AI Technical Summary
Existing server memory aging tests are energy-intensive and time-consuming, resulting in excessive power consumption and high testing costs, as well as low test coverage.
Gray code sorting is used to write and read Gray code information into memory blocks, and memory stress testing is performed by reducing the number of level flips.
It reduced memory testing power consumption, shortened testing time, increased test coverage, and met testing requirements.
Smart Images

Figure CN115599614B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of server testing technology, and in particular to a server memory stress testing method and system. Background Technology
[0002] The energy consumption generated during server production is receiving increasing attention. Taking server manufacturer I as an example, in a certain year, it consumed 33.61 million kWh of electricity. Based on an average annual household consumption of 432.7 kWh, manufacturer I's annual electricity consumption is equivalent to the annual electricity consumption of 77,677 households. Approximately 83% of manufacturer I's annual electricity consumption was for factory production, and 62% of that 80% was for aging stress testing. This means that in that year, manufacturer I consumed approximately 17.29 million kWh of electricity for aging stress testing of its servers, storage, switches, and other products. Therefore, reducing the energy consumption of aging stress testing is crucial for energy conservation, emission reduction, and corporate efficiency. Taking a certain server model as an example, memory energy consumption accounts for 40.02% of the total machine energy consumption. Therefore, reducing energy consumption in memory aging testing has a significant impact on overall aging energy consumption. This invention is specifically designed to reduce memory energy consumption during aging testing.
[0003] The following mainly uses a general-purpose X86 platform server as an example to analyze the problems existing in memory aging tests. The main method of memory aging pressure is as follows: First, allocate a custom memory block (64K / 128K / 32M). After success, write data pattern0xA5 (1010 0101) every 8 bits of this defined memory block. After waiting for successful writing, read the data of the corresponding memory block. After successful reading, check whether the data of each memory block is distributed according to 0xA5. After successful check, write data pattern 0x5A (0101 1010) every 8 bits of this defined memory block; after waiting for successful writing, read the data of the corresponding memory block. After successful writing, check whether the data of each memory block is distributed according to 0xA5. Continue to allocate the remaining memory. If it can still be allocated, continue to (1) to (7). If there is no remaining memory to allocate, the first test loop ends. Start the next loop test from (1). If any step fails in the middle, the test stops and the test log is thrown. This testing method is relatively easy to implement, but it also has obvious drawbacks. One is that the testing time is relatively long because it requires continuous memory allocation. This method was more suitable when the server memory was smaller (e.g., 32G). However, the memory of current servers is generally above 128G. This continuous memory allocation makes the testing time too long, and may even exceed the total aging test time (12 / 16 / 24 hours).
[0004] Secondly, although writing data patterns (0xA5 and 0x5A) can set adjacent bits to 0 and 1, thus covering every bit in each memory cell to a high or low level and achieving high test coverage, this also increases power consumption. This is because memory power consumption mainly comes from the charging and discharging current of its internal capacitors, according to the formula: Among them, C l It is the load capacitance value of a single line in the memory circuit, V. dd It is the voltage value of the memory, T l This refers to the number of voltage level transitions from 0 to 1 or 1 to 0. Since the load capacitance and voltage values are determined during memory manufacturing and cannot be changed in server production, the only way to reduce the energy consumption of memory testing is to reduce the number of transitions.
[0005] Based on the above testing methods, it can be seen that the power consumption during memory stress testing on the production line is too high, which does not conform to the general direction of energy conservation and emission reduction, and the testing cost is too high.
[0006] Furthermore, the excessively long duration of individual tests results in too few scans of each cell during the aging process, reducing the actual coverage of memory tests. Summary of the Invention
[0007] This invention provides a server memory stress testing method, which can reduce energy consumption by reducing the number of level transitions.
[0008] Server memory stress testing methods include:
[0009] Step 1: Allocate the physical memory of the server under test into multiple memory blocks according to the preset capacity;
[0010] Step 2: In response to the preset first Gray code sorting method, write the Gray code information corresponding to the first Gray code sorting method into each memory block in sequence;
[0011] Step 3: Read the Gray code information in each memory block sequentially and check if it is sorted according to the first Gray code.
[0012] Step 4: If not, an error will be reported and the test will stop.
[0013] It should be further noted that if so, then in response to the preset second Gray code sorting method, the Gray code information corresponding to the second Gray code sorting method is written into each memory block in sequence;
[0014] After writing is complete, read the Gray code information in each memory block in turn and check whether it is sorted according to the second Gray code.
[0015] If yes, then in response to the preset third Gray code sorting method, the Gray code information corresponding to the third Gray code sorting method is written into each memory block in sequence;
[0016] After writing is complete, read the Gray code information in each memory block sequentially and check whether it is sorted according to the third Gray code.
[0017] This process continues until the preset number of cycles is reached, completing the stress test and displaying the test results.
[0018] It should be further noted that during each check to see if the order is sorted according to Gray code, if not, an error is reported and the test stops.
[0019] It should be further noted that the method uses the Pattern function to read Gray code information in each memory block.
[0020] It should be further noted that in step one, a memory block is divided into 128 bits.
[0021] It should be further noted that multiple test digits are set in the first Gray code sorting queue. The test digits are counted in decimal according to the sorting queue, and each test digit corresponds to a Gray code.
[0022] It should be further explained that multiple memory blocks are encoded and sorted.
[0023] It should be further noted that the second and third Gray code sorting methods are simply reorderings of the first Gray code sorting method.
[0024] It should be further noted that the first Gray code sorting queue is set to have a number of test digits greater than the number of memory blocks.
[0025] The present invention also provides a server stress testing system, the system comprising: a memory capacity allocation module, a Gray code sorting module, a Gray code allocation module, an information reading module, and a stress testing module;
[0026] The memory capacity allocation module is used to allocate the physical memory of the server under test into multiple memory blocks according to a preset capacity;
[0027] The Gray code sorting module is used to set multiple test digits and sort the multiple test digits in a preset order to form a first Gray code sorting queue.
[0028] The Gray code allocation module is used to write Gray code information into each memory block sequentially according to the first Gray code sorting method;
[0029] The information reading module is used to read Gray code information from each memory block sequentially;
[0030] The stress test module is used to detect the Gray code information in the read memory block and determine whether it is sorted according to the first Gray code; if so, the test continues.
[0031] If not, an error will be reported and the test will stop.
[0032] As can be seen from the above technical solutions, the present invention has the following advantages:
[0033] This invention utilizes Gray code to customize the corresponding read and write methods by calling functions within memory. Gray code information is written into memory and then read from memory, thereby achieving the purpose of memory aging test.
[0034] The server stress testing system solves the problems of high power consumption and excessive testing costs during memory stress testing. Compared with existing methods, the test duration is shortened, addressing the issue of excessively long individual test durations leading to insufficient scans of each cell during aging time, thus reducing the actual coverage of memory testing. This invention can cover all memory storage locations for testing, meeting testing requirements. Attached Figure Description
[0035] To more clearly illustrate the technical solution of the present invention, the accompanying drawings used in the description will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0036] Figure 1 Flowchart of server memory stress testing method;
[0037] Figure 2 Flowchart of an embodiment of a server memory stress testing method;
[0038] Figure 3 This is a schematic diagram of a server stress testing system. Detailed Implementation
[0039] like Figure 1This is an illustration of the server stress testing method provided by the present invention. It is only intended to illustrate the basic concept of the present invention. The server memory stress testing method is applied to one or more test terminals. The test terminal is a device that can automatically perform numerical calculations and / or information processing according to pre-set or stored instructions. Its hardware includes, but is not limited to, microprocessors, application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), digital signal processors (DSPs), embedded devices, etc.
[0040] The test terminal can be any electronic product capable of human-computer interaction, such as a personal computer, tablet computer, or Internet Protocol Television (IPTV). The test terminal may also include network equipment and / or user equipment. The network equipment includes, but is not limited to, a single network server, a server group consisting of multiple network servers, or a cloud based on cloud computing consisting of a large number of hosts or network servers. The network in which the test terminal is located includes, but is not limited to, the Internet, a wide area network (WAN), a metropolitan area network (MAN), a local area network (LAN), and a virtual private network (VPN).
[0041] The following will combine Figures 1 to 2 This invention details the server memory stress testing method, which can be applied to server memory aging analysis. It simulates server operation under full load in ambient or high / low temperature environments, scanning the CPU, memory, BMC, I2C, and other hardware to verify the server's ability to function normally under continuous stress. Aging tests typically last 12, 16, or 24 hours, with the test duration determined based on user requirements. The server memory stress testing method can reveal the stress change trend during the memory testing process, evaluate whether the server memory meets specifications, and identify any potential anomalies, thus playing a positive role in improving server stability.
[0042] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0043] To address the issues of high energy consumption and excessively long testing time in aging memory stress testing, this invention uses DDR5 memory as an example to illustrate the server memory stress testing method of this invention. The method includes:
[0044] S101. Allocate the physical memory of the server under test into multiple memory blocks according to a preset capacity; for example, 128 bits can be used as one memory block.
[0045] S102. In response to the preset first Gray code sorting method, write the Gray code information corresponding to the first Gray code sorting method into each memory block in sequence;
[0046] The Gray code involved in this invention is a communication encoding technique commonly used in analog-to-digital and position-to-digital conversion circuits. As shown in Table 1, the encoding of two adjacent numbers in Gray code differs by only one bit; that is, only one bit is flipped between adjacent numbers. According to the formula... The fewer flips there are, the less energy is consumed in memory circuit testing.
[0047] The first Gray code sorting queue contains multiple test digits, which are counted in decimal. Each test digit corresponds to a Gray code. Multiple memory blocks are encoded and sorted. The second and third Gray code sorting methods are simply reordering the first Gray code sorting method.
[0048] For example, the first Gray code sorting queue has 16 test digits, sorted from 0 to 15.
[0049] Table 1. Gray code and binary code diagrams;
[0050] decimal number 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 binary code 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 Graymall 0000 0001 0011 0010 0110 0111 0101 0100 1100 1101 1111 1110 1010 1011 1001 1000
[0051] The first Gray code sorting queue is set to have a test digit greater than the number of memory blocks. This ensures that each memory block is allocated a Gray code.
[0052] The second Gray code sorting method can sort in the order of 1, 2, 3...15, 0. The third Gray code sorting method can sort in the order of 15, 0, 1...14. Of course, the first, second, and third Gray code sorting methods can be used according to the actual situation, and the specific sorting order is not limited.
[0053] S103. Read the Gray code information in each memory block sequentially and check whether it is sorted according to the first Gray code.
[0054] This invention uses the Pattern function to read Gray code information from each memory block.
[0055] S104. If not, an error will be reported and the test will stop. An error indicates a memory problem, requiring an alert or the test process to be logged, with the error information noted in the log.
[0056] The aforementioned server memory stress testing method solves the problems of high power consumption and excessive testing costs during memory stress testing. Compared to existing methods, it addresses the issue of excessively long individual test durations leading to insufficient cell scans during aging time, thus reducing the actual memory test coverage. This invention can cover all memory storage locations for testing, meeting testing requirements.
[0057] In other words, the Gray code information in each memory block is read sequentially, and it is checked whether it is sorted according to the first Gray code. If it is, then in response to the preset second Gray code sorting method, the Gray code information corresponding to the second Gray code sorting method is written into each memory block sequentially.
[0058] After writing is complete, read the Gray code information in each memory block in turn and check whether it is sorted according to the second Gray code.
[0059] If yes, then in response to the preset third Gray code sorting method, the Gray code information corresponding to the third Gray code sorting method is written into each memory block in sequence;
[0060] After writing is complete, read the Gray code information in each memory block sequentially and check whether it is sorted according to the third Gray code.
[0061] This process continues until the preset number of cycles is reached, completing the stress test and displaying the test results.
[0062] As part of the above testing process, during each check to see if the order is sorted according to Gray code, if not, an error is reported and the test stops.
[0063] It should be understood that the sequence number of each step in the above embodiments does not imply the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention.
[0064] In one embodiment of the present invention, based on the server memory stress testing method, the following will provide a possible embodiment and describe its specific implementation in a non-limiting manner.
[0065] The physical memory of the server under test is allocated into multiple memory blocks according to the preset capacity. Specifically, the memory can be allocated into 64M blocks for testing.
[0066] Referring to Gray code 0 to 15, and referring to Table 1, the pattern 0x01326754CDFEAB98 is written to each allocated memory block.
[0067] After the write operation is complete, read the contents of each memory block and check if they are arranged according to 0x01326754CDFEAB98. If they are not the same, report an error and stop the test.
[0068] If yes, then in response to the preset second Gray code sorting method, the Gray code information corresponding to the second Gray code sorting method is written into each memory block in sequence;
[0069] Here, we can refer to the order of Gray code 1, 2, ... 15, 0 and write 0x1326754CDFEAB980pattern into each allocated memory block.
[0070] After the write operation is complete, read the contents of each memory block and check if they are arranged according to 0x1326754CDFEAB980. If they are different, report an error and stop the test.
[0071] If yes, continue writing in this order. The last pattern of the first loop is 0x801326754CDFEAB9.
[0072] After the write operation is complete, read the contents of each memory block and check if they are arranged according to 0x801326754CDFEAB9. If they are different, report an error and stop the test.
[0073] The above can be used as the first loop test.
[0074] After completion, another cycle of testing can be performed as needed. The next cycle of testing repeats the above process. When the test duration ends, the test stops and the aging test ends.
[0075] The server memory stress testing method proposed in this invention can be implemented at low cost while saving energy. This invention can save 25% of aging energy consumption. Based on the company's current electricity consumption, it is estimated that 31.61 million kWh * 0.62 * 0.25 = 4.89 million kWh can be saved, directly saving 3.55 million yuan in costs. Moreover, it is based on DDR5 memory allocation, data pattern editing mechanism and implementation method, and read / write method.
[0076] The following are embodiments of the server stress testing system provided in this disclosure. The server stress testing system and the server memory stress testing methods described above belong to the same inventive concept. Details not fully described in the embodiments of the server stress testing system can be found in the embodiments of the server memory stress testing methods described above. Figure 3 As shown, the server stress testing system includes: a memory capacity allocation module, a Gray code sorting module, a Gray code allocation module, an information reading module, and a stress testing module;
[0077] The memory capacity allocation module is used to allocate the physical memory of the server under test into multiple memory blocks according to a preset capacity;
[0078] The Gray code sorting module is used to set multiple test digits and sort the multiple test digits in a preset order to form a first Gray code sorting queue.
[0079] The Gray code allocation module is used to write Gray code information into each memory block sequentially according to the first Gray code sorting method;
[0080] The information reading module is used to read Gray code information from each memory block sequentially;
[0081] The stress test module is used to detect the Gray code information in the read memory block and determine whether it is sorted according to the first Gray code. If it is, the test continues; if not, an error is reported and the test stops.
[0082] Thus, this invention, based on Gray code, utilizes internal memory call functions to customize corresponding read and write methods, writes Gray code information into memory, and then reads the Gray code information from memory, thereby achieving the purpose of memory aging test.
[0083] The server stress testing system solves the problems of high power consumption and excessive testing costs during memory stress testing. Compared with existing methods, the test duration is shortened, addressing the issue of excessively long individual test durations leading to insufficient scans of each cell during aging time, thus reducing the actual coverage of memory testing. This invention can cover all memory storage locations for testing, meeting testing requirements.
[0084] The units and algorithm steps of the various examples described in the disclosed embodiments of the server stress testing system of this invention can be implemented in electronic hardware, computer software, or a combination of both. To clearly illustrate the interchangeability of hardware and software, the components and steps of each example have been generally described in terms of functionality in the foregoing description. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementations should not be considered beyond the scope of this invention.
[0085] The illustrations of the server stress testing system of the present invention illustrate the architecture, functionality, and operation of possible implementations of the apparatus, method, and computer program product according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagram may represent a module, segment, or portion of code containing one or more executable instructions for implementing a specified logical function. It should also be noted that in some alternative implementations, the functions marked in the blocks may occur in a different order than those shown in the figures. For example, two consecutively represented blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in the block diagram and / or flowchart, and combinations of blocks in the block diagram and / or flowchart, may be implemented using a dedicated hardware-based system that performs the specified function or operation, or using a combination of dedicated hardware and computer instructions.
[0086] In the server stress testing system of this invention, it should be understood that the disclosed systems, apparatuses, and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the mutual coupling or direct coupling or communication connections shown or discussed may be indirect coupling or communication connections through some interfaces, devices, or units, or they may be electrical, mechanical, or other forms of connection.
[0087] The terms "first," "second," "third," "fourth," etc. (if present) in the specification, claims, and accompanying drawings of this invention are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that embodiments of the invention described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover non-exclusive inclusion.
[0088] The above description of the disclosed embodiments enables those skilled in the art to make or use the invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the invention. Therefore, the invention is not to be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims
1. A method for server memory stress testing, the method comprising: The methods include: Step 1: Allocate the physical memory of the server under test into multiple memory blocks according to the preset capacity; Step 2: In response to the preset first Gray code sorting method, write the Gray code information corresponding to the first Gray code sorting method into each memory block in sequence; Step 3: Read the Gray code information in each memory block sequentially and check if it is sorted according to the first Gray code. Step 4: If not, an error will be reported and the test will stop; If yes, then in response to the preset second Gray code sorting method, the Gray code information corresponding to the second Gray code sorting method is written into each memory block in sequence; After writing is complete, read the Gray code information in each memory block in turn and check whether it is sorted according to the second Gray code. If yes, then in response to the preset third Gray code sorting method, the Gray code information corresponding to the third Gray code sorting method is written into each memory block in sequence; After writing is complete, read the Gray code information in each memory block sequentially and check whether it is sorted according to the third Gray code. This process continues until the preset number of loops is reached, completing the stress test and displaying the test results. During each check to see if the order is sorted according to Gray code, if not, an error is reported and the test stops.
2. The server memory stress testing method according to claim 1, characterized in that, The method uses the Pattern function to read Gray code information from each memory block.
3. The server memory stress testing method according to claim 1, characterized in that, In the method, In step one, a memory block is formed by dividing the memory into 128 bits.
4. The server memory stress testing method according to claim 1, characterized in that, The first Gray code sorting queue contains multiple test digits, which are counted in decimal according to the sorting queue rules. Each test digit corresponds to a Gray code.
5. The server memory stress testing method according to claim 4, characterized in that, Encode and sort multiple memory blocks.
6. The server memory stress testing method according to claim 4, characterized in that, The second and third Gray code sorting methods are simply reordering the first Gray code sorting method.
7. The server memory stress testing method according to claim 4, characterized in that, The first Gray code sorting queue is set to have a number of test digits greater than the number of memory blocks.
8. A server stress testing system, characterized in that, The system adopts the server memory stress testing method as described in any one of claims 1 to 7; The system includes: a memory capacity allocation module, a Gray code sorting module, a Gray code allocation module, an information reading module, and a stress testing module; The memory capacity allocation module is used to allocate the physical memory of the server under test into multiple memory blocks according to a preset capacity; The Gray code sorting module is used to set multiple test digits and sort the multiple test digits in a preset order to form a first Gray code sorting queue. The Gray code allocation module is used to write Gray code information into each memory block sequentially according to the first Gray code sorting method; The information reading module is used to read Gray code information from each memory block sequentially; The stress test module is used to detect the Gray code information in the read memory block and determine whether it is sorted according to the first Gray code; if so, the test continues. If not, an error will be reported and the test will stop.