Ceph-based storage system robustness test method, device, equipment and medium

By using the vdbench tool and configuration files to automatically simulate fault scenarios, the problem of long and insufficient testing time in existing Ceph storage systems has been solved, achieving efficient and accurate fault scenario testing and improving the robustness of the storage system.

CN116089200BActive Publication Date: 2026-06-23JINAN INSPUR DATA TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
JINAN INSPUR DATA TECH CO LTD
Filing Date
2023-02-10
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

In the testing of existing Ceph-based storage systems, manually simulating failure scenarios is time-consuming and insufficient, making it difficult to effectively correlate various failure scenarios, which affects testing efficiency and accuracy.

Method used

This paper presents a robustness testing method for a Ceph-based storage system. It uses the vdbench tool to simulate business pressure, automatically simulates various failure scenarios through configuration files, monitors log output, records key information for analysis, and outputs a summary of test results.

Benefits of technology

It enables automated testing of storage system failure scenarios, improving testing efficiency and accuracy. It allows setting the weight of different failure scenarios based on configuration items, increasing the testing intensity of key failure scenarios, and improving the coverage of failure scenarios.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116089200B_ABST
    Figure CN116089200B_ABST
Patent Text Reader

Abstract

The application discloses a kind of based on the robustness test method, device, computer equipment and medium of ceph's storage system, belong to computer technology field.The robustness test method based on the ceph's storage system includes: mounting ceph storage system directory or volume on test press, calling storage test tool vdbench runs storage service test script, simulates business pressure;Set fault simulation operation related parameters in configuration file and parse configuration file;According to the analysis result of configuration file, simulate the corresponding fault test scene;Monitoring storage service test script running output log, and record current key information when detecting that exception occurs;And read current key information and analyze, and output robustness test summary result.The test method can realize the various fault scenes and associated tests of ceph storage system automatically, improve test efficiency and accuracy.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of computer technology, and in particular to a method, apparatus, computer equipment, and medium for robustness testing of a Ceph-based storage system. Background Technology

[0002] Storage systems, as one of the three fundamental IT resources, play a crucial role in IT systems. With the widespread application of technologies such as cloud computing and big data, the demand for storage space is increasing exponentially. The stability of storage systems directly impacts the normal operation of the entire business system; data loss or system downtime in storage systems can have a fatal impact on business continuity. Ceph, as one of the most widely used distributed storage systems, is widely adopted. Therefore, ensuring Ceph's stability is particularly important during the development of Ceph-based storage systems. However, the stability of storage systems often involves multiple components such as the network, hard drives, and nodes. In actual testing, various failure scenarios need to be manually and repeatedly simulated to verify the system's robustness. This process is time-consuming, and the lack of effective correlation between different failure scenarios can lead to insufficient testing. Summary of the Invention

[0003] In view of this, the purpose of this invention is to propose a robustness testing method for Ceph-based storage systems. This testing method can automatically perform various fault scenario and correlation tests on Ceph storage systems, improving testing efficiency and accuracy.

[0004] To achieve the above objectives, one aspect of this invention provides a Ceph-based robustness testing method for storage systems. The method includes: mounting a Ceph storage system directory or volume on a test stress machine; calling the storage testing tool vdbench to run a storage service test script to simulate service pressure; setting relevant parameters for fault simulation in a configuration file; parsing the configuration file and simulating corresponding fault test scenarios based on the parsing results; monitoring the output logs of the storage service test script and recording current key information when an anomaly is detected; and reading and analyzing the current key information, and outputting a summary robustness test result.

[0005] In some implementations, mounting the Ceph storage system directory or volume on the test stress machine and calling the storage testing tool vdbench to run the storage service test script to simulate service stress includes:

[0006] vdbench was used as the testing tool for storage business test scripts to perform continuous IO read and write tests on the mounted Ceph storage system directory or volume. The business pressure IO model was 4K 100% random read and write, with a read ratio of 30%, and the running time was set to an integer multiple of 24 hours.

[0007] In some implementations, setting the relevant parameters for fault simulation operation in the configuration file includes:

[0008] In the configuration file, configuration items are described in the form of key-value pairs, wherein the configuration items include at least the fault simulation time, fault simulation interval, fault trigger mode, and fault weight.

[0009] In some implementations, the fault triggering mode includes a single fault mode and a mixed fault mode, wherein the mixed fault mode supports the simultaneous triggering of two different fault scenarios, and the weight of each fault represents its proportion in the total number of fault simulations.

[0010] In some implementations, the failure scenarios include at least: disk sub-health, disk plugging / unplugging, network sub-health, node memory leak, and node restart power failure.

[0011] In some implementations, simulating the corresponding fault test scenario based on the parsing results of the configuration file includes:

[0012] After parsing the configuration file, the corresponding fault scenario is simulated and triggered at a specified time.

[0013] In some implementations, monitoring the output logs of the storage service test script and recording current key information when an anomaly is detected includes:

[0014] During the test run, on the test stress machine side, vdbench can continuously output logs in line-by-line format, wherein the logs in line-by-line format include at least the current IOPS, bandwidth, and time.

[0015] The storage system continuously monitors the currently output log information;

[0016] Record key information when vdbench terminates abnormally or the current input IOPS is 0.

[0017] In some implementations, the current key information includes: current time, current vdbench running status IOPS size, current running fault scenario, and the health status of the ceph storage system.

[0018] In some implementations, the step of reading and analyzing the current key information and outputting the robustness test summary results further includes:

[0019] Read the current key information and determine the test result based on the current key information;

[0020] After processing all current key information, output the test summary results.

[0021] In another aspect, this invention provides a Ceph-based storage system robustness testing device. This Ceph-based storage system robustness testing device includes a service stress unit, a configuration analysis unit, a fault simulation unit, a system monitoring unit, and a log recording unit. The service stress unit is configured to mount a Ceph storage system directory or volume, run storage service test scripts, and simulate service stress. The configuration analysis unit is configured to set fault simulation execution parameters in a configuration file and parse the configuration file. The fault simulation unit is configured to simulate corresponding fault test scenarios based on the parsing results of the configuration file. The system monitoring unit is configured to monitor the log output of the storage service test scripts and record current key information when an anomaly is detected. The log recording unit is configured to read the current key information, analyze it, and output a summary of robustness test results.

[0022] In some implementations, the business stress unit is specifically configured to use vdbench as a test tool for storage business test scripts to perform continuous IO read and write tests on the mounted Ceph storage system directory or volume. The business stress IO model is 4K 100% random read and write, with a read ratio of 30%, and the running time is set to an integer multiple of 24 hours.

[0023] In some implementations, the configuration analysis unit is specifically configured to describe configuration items in the configuration file in the form of key-value pairs, wherein the configuration items include at least fault simulation time, fault simulation interval, fault triggering mode, and fault weight.

[0024] In some implementations, the fault triggering mode includes a single fault mode and a mixed fault mode, wherein the mixed fault mode supports the simultaneous triggering of two different fault scenarios, and the weight of each fault represents its proportion in the total number of fault simulations.

[0025] In some implementations, the failure scenarios include at least: disk sub-health, disk plugging / unplugging, network sub-health, node memory leak, and node restart power failure.

[0026] In some implementations, the fault simulation unit is specifically configured to simulate and trigger the corresponding fault scenario at a set time after parsing the configuration file, wherein the set time is determined based on the total running time and the interval time.

[0027] In some implementations, the system monitoring unit is specifically configured to enable vdbench to continuously output line-by-line logs on the test stress machine side during test operation, wherein the line-by-line logs include at least the current IOPS, bandwidth, and time.

[0028] The storage system continuously monitors the currently output log information;

[0029] Record key information when vdbench terminates abnormally or the current input IOPS is 0.

[0030] In some implementations, the current key information includes: current time, current vdbench running status IOPS size, current running fault scenario, and the health status of the ceph storage system.

[0031] In some implementations, the log recording unit is specifically configured to read current key information and determine the test results based on the current key information;

[0032] After processing all current key information, output the test summary results.

[0033] In another aspect of the present invention, a computer device is provided, comprising: at least one processor; and a memory storing computer instructions executable on the processor, wherein the instructions, when executed by the processor, implement the steps of a method including: mounting a Ceph storage system directory or volume on a test stress tester, and calling the storage test tool vdbench to run a storage business test script to simulate business stress.

[0034] Set the relevant parameters for fault simulation in the configuration file and parse the configuration file;

[0035] Simulate the corresponding fault test scenario based on the parsing results of the configuration file;

[0036] Monitor the output logs of the storage service test script and record key information when an anomaly is detected; and

[0037] The system reads and analyzes the current key information and outputs a summary of robustness test results.

[0038] In another aspect, the present invention provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the above-described method steps.

[0039] The present invention has at least the following beneficial technical effects:

[0040] The Ceph-based robustness testing method and system for storage systems of this invention can automatically simulate various failure scenarios, greatly improving testing efficiency. Furthermore, this method allows for setting different weights for different failure scenarios based on configuration, enabling increased testing intensity for scenarios with more problems. In addition, this method can combine and simulate multiple different failure scenarios, improving the coverage of storage system failure scenario testing. Attached Figure Description

[0041] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other embodiments can be obtained based on these drawings without creative effort.

[0042] Figure 1 A schematic diagram illustrating an embodiment of the Ceph-based storage system robustness testing method provided by the present invention;

[0043] Figure 2 A schematic diagram of an embodiment of the Ceph-based storage system robustness testing device provided by the present invention;

[0044] Figure 3 A schematic diagram of an embodiment of the computer device provided by the present invention;

[0045] Figure 4 A schematic diagram illustrating an embodiment of the computer-readable storage medium provided by the present invention. Detailed Implementation

[0046] To make the objectives, technical solutions, and advantages of the present invention clearer, the embodiments of the present invention will be further described in detail below with reference to specific examples and the accompanying drawings.

[0047] It should be noted that all uses of "first" and "second" in the embodiments of the present invention are for the purpose of distinguishing two entities or parameters with the same name but different names. It is clear that "first" and "second" are only for the convenience of expression and should not be construed as limiting the embodiments of the present invention. Subsequent embodiments will not explain this in detail.

[0048] Based on the above objectives, a first aspect of the present invention provides an embodiment of a stability testing method for concurrent execution of multiple files on a hard disk. Figure 1 This diagram illustrates an embodiment of the Ceph-based storage system robustness testing method provided by the present invention. Figure 1 As shown, the storage system robustness testing method of this invention includes the following steps:

[0049] 001. Mount the Ceph storage system directory or volume on the test stress machine, and call the storage test tool vdbench to run the storage business test script to simulate business stress;

[0050] 002. Set the relevant parameters for fault simulation in the configuration file and parse the configuration file;

[0051] 003. Simulate the corresponding fault test scenarios based on the parsing results of the configuration file;

[0052] 004. Monitor the output logs of the storage service test script and record key information when an anomaly is detected;

[0053] 005. Read the current key information, analyze it, and output the robustness test summary results.

[0054] In this embodiment, testing is performed on a test stress machine to simulate business pressure. The business stress unit mounts a Ceph storage system directory or volume and calls the storage testing tool vdbench to run test scripts, simulating business pressure. Fault simulation parameters are set in the configuration file. The configuration analysis unit reads and parses the test configuration file, providing a foundation for the fault simulation unit. The fault simulation unit automatically triggers the simulation of corresponding fault test scenarios and combinations based on the configuration file parsing results. The system monitoring unit dynamically monitors the output information during the execution of the business test script and transmits key information to the logging unit for recording. The logging unit records information when the business test script malfunctions, as well as key information about the current fault simulation unit's operation and the storage system status, facilitating problem tracking and analysis.

[0055] In some embodiments of the present invention, mounting a Ceph storage system directory or volume on the test stress machine and calling the storage test tool vdbench to run a storage service test script to simulate service stress includes: using vdbench as the test tool for the storage service test script to perform continuous IO read and write tests on the mounted Ceph storage system directory or volume, wherein the service stress IO model is 4K 100% random read and write, with a read ratio of 30%, and the running time is set to an integer multiple of 24 hours.

[0056] In this embodiment, the storage services provided by the Ceph system are commonly in the form of blocks and files, and appear externally as volumes and directories. The test stress machine acts as a client, mounting the volumes or directories provided by the Ceph system, and calling the storage testing tool vdbench to run test scripts to simulate business pressure. An example of the vdbench test script is shown below. This script mounts six 1024G volumes mapped from the storage system, uses three client stress machines for testing, configures the IO model to 4K size 100% random read and write, with a read ratio of 30%, and sets the running time to an integer multiple of 24 hours, such as 24, 48, etc.

[0057] In some embodiments of the present invention, setting parameters related to fault simulation in the configuration file includes: describing configuration items in the configuration file using key-value pairs, wherein each configuration item includes at least fault simulation time, fault simulation interval, fault triggering mode, and fault weight. The fault triggering mode includes a single fault mode and a mixed fault mode, wherein the mixed fault mode supports the simultaneous triggering of two different fault scenarios, and the weight of each fault represents its proportion in the total number of fault simulations. Fault scenarios include at least: disk sub-health, disk unplugging / plugging, network sub-health, node memory leak, and node restart power failure.

[0058] In this embodiment, the configuration file is used to specify various operating parameters for simulating faults. Configuration items are described in key-value pairs, including fault simulation time, fault simulation interval, fault trigger mode, and fault weight. The fault trigger mode includes two modes: single fault and mixed fault. Mixed faults support the simultaneous triggering of two different fault scenarios. The fault weight represents the proportion of a specific fault in the total number of fault simulations. Each part is separated by "=". The following is an example configuration file for the single fault mode:

[0059] runtime=24

[0060] interval=1

[0061] mode=normal

[0062] percentage = 30, 20, 20, 20, 10

[0063] The example above indicates that the total simulation time is 24 hours, with one fault simulation every hour. The fault simulation mode is normal (single fault), which sequentially simulates five common fault scenarios: disk sub-health (high disk I / O pressure), disk plugging / unplugging, network sub-health (high network latency), node memory leak, and node restart. The simulation ratios of the five common fault scenarios are 30%, 20%, 20%, 20%, and 10%, respectively. That is, within the 24-hour simulation test period, the simulations are performed 7 times, 5 times, 5 times, 5 times, and 2 times, respectively.

[0064] The following is a sample configuration file for hybrid mode:

[0065] runtime=24

[0066] interval=1

[0067] mode=mix

[0068] mixtype = (1,2)

[0069] The fault simulation mode is called mix, which is a mixed fault simulation mode. mixtype = (1,2) means that both modes 1 and 2 faults are simulated at the same time. 1-5 represent five fault scenarios in turn: disk sub-health, disk plugging and unplugging, network sub-health, node memory leak, and node restart power failure.

[0070] In some embodiments of the present invention, simulating the corresponding fault test scenario based on the parsing result of the configuration file includes: after parsing the configuration file, simulating the triggering of the corresponding fault scenario at a set time, wherein the set time is determined based on the total running time and the interval time.

[0071] In this embodiment, after parsing the configuration file, the fault simulation unit will simulate and trigger corresponding fault scenarios at a set time. Among them, disk sub-health faults are simulated by using storage tools such as fio to perform high-pressure IO read operations on the data disk on the storage node, causing the data disk to have high service pressure and low disk utilization; using SCSI related commands to simulate disk plugging and unplugging operations; injecting latency into the specified network on the storage node through the tc command to simulate network sub-health; continuously requesting node physical memory through scripts to simulate memory leaks during the phase; and executing shutdown, reboot, or bmc commands to simulate node restart and abnormal power failure scenarios.

[0072] In some embodiments of the present invention, monitoring the output logs of the storage service test script and recording current key information when an anomaly is detected includes: during test execution, on the test stress machine side, vdbench continuously outputs logs in line-by-line format, wherein the logs in line-by-line format include at least the current IOPS, bandwidth, and time; the storage system continuously monitors the currently output log information; when vdbench is detected to have terminated abnormally or the current input IOPS is 0, the current key information is recorded. The current key information includes: the current time, the current vdbench running status IOPS, the current fault scenario, and the health status of the Ceph storage system.

[0073] In this embodiment, during the test run, the vdbench on the stress test server continuously outputs line-by-line logs, with each line containing information such as current IOPS, bandwidth, and time. The system continuously monitors the currently output log information. When it detects that vdbench has terminated abnormally or that the current input IOPS is 0, it records the current time point and related information and passes it to the log recording unit. The log recording unit records this information and also records the current fault scenario and the health status of the Ceph system (network, disk, etc.) to the log file.

[0074] In some embodiments of the present invention, reading and analyzing current key information and outputting robustness test summary results further includes: reading current key information and judging test results based on current key information; and outputting test summary results after processing all current key information.

[0075] In this embodiment, the system reads log information and determines the output test results based on the log information. For example, if the log records that the system experiences IO interruption on the vdbench side when testing for network sub-health faults, the test is determined to have failed in that scenario; otherwise, it is considered normal. After processing all log information, the system outputs a summary test result, such as the number of times a certain fault test failed and related information in 24 tests (simulating a fault once every hour for 24 hours).

[0076] The Ceph-based robustness testing method and system for storage systems of this invention can automatically simulate various failure scenarios, greatly improving testing efficiency. Furthermore, this method allows for setting different weights for different failure scenarios based on configuration, enabling increased testing intensity for scenarios with more problems. In addition, this method can combine and simulate multiple different failure scenarios, improving the coverage of storage system failure scenario testing.

[0077] In view of the above objectives, a second aspect of the present invention provides a Ceph-based robustness testing device for storage systems. Figure 2 This diagram illustrates an embodiment of the Ceph-based storage system robustness testing apparatus provided by the present invention. Figure 2 As shown, the stability testing device for concurrent execution of multiple files on a hard disk according to an embodiment of the present invention includes: a business stress unit 011, configured to mount a Ceph storage system directory or volume, run a storage business test script, and simulate business stress; a configuration analysis unit 012, configured to set relevant parameters for fault simulation operation in a configuration file and parse the configuration file; a fault simulation unit 013, configured to simulate corresponding fault test scenarios based on the parsing results of the configuration file; a system monitoring unit 014, configured to monitor the output logs of the storage business test script and record current key information when an anomaly is detected; and a log recording unit 015, configured to read the current key information, analyze it, and output a robustness test summary result.

[0078] In some embodiments of the present invention, the business pressure unit 011 is further configured to use vdbench as a test tool for storage business test scripts to perform continuous IO read and write tests on the mounted ceph storage system directory or volume, wherein the business pressure IO model is 4K 100% random read and write, the read ratio is 30%, and the running time is set to an integer multiple of 24 hours.

[0079] In some embodiments of the present invention, the configuration analysis unit 012 is further configured to describe configuration items in the configuration file in the form of key-value pairs, wherein the configuration items include at least fault simulation time, fault simulation interval, fault triggering mode and fault weight.

[0080] In some embodiments of the present invention, the fault triggering mode includes a single fault mode and a mixed fault mode, wherein the mixed fault mode supports the simultaneous triggering of two different fault scenarios, and the weight of each fault represents its proportion in the total number of fault simulations.

[0081] In some embodiments of the present invention, the failure scenarios include at least: disk sub-health, disk plugging / unplugging, network sub-health, node memory leak, and node restart power failure.

[0082] In some embodiments of the present invention, the fault simulation unit 013 is further configured to simulate and trigger the corresponding fault scenario at a set time after parsing the configuration file, wherein the set time is determined based on the total running time and the interval time.

[0083] In some embodiments of the present invention, the system monitoring unit 014 is further configured such that during test operation, on the test stress machine side, vdbench can continuously output logs in line-by-line format, wherein the logs in line-by-line format include at least the current IOPS, bandwidth, and time; the storage system continuously monitors the currently output log information; and when vdbench is detected to have terminated abnormally or the current input IOPS is 0, the system records the current key information.

[0084] In some embodiments of the present invention, the current key information includes: the current time, the current IOPS size of the vdbench running state, the current running fault scenario, and the health status of the ceph storage system.

[0085] In some embodiments of the present invention, the log recording unit 015 reads the current key information and judges the test results based on the current key information; after processing all the current key information, it outputs the test summary results.

[0086] The Ceph-based storage system robustness testing device of this invention comprises five units: a service stress unit 011, a configuration analysis unit 012, a fault simulation unit 013, a system monitoring unit 014, and a log recording unit 015. In this device, the fault simulation unit 013 can automatically simulate various fault scenarios based on the parsed configuration file of the configuration analysis unit 012, greatly improving testing efficiency. Simultaneously, the configuration analysis unit 012 can set different weights for different fault scenarios based on configuration items, increasing the testing intensity for fault scenarios with more problems. Furthermore, the fault simulation unit 013 can combine and simulate multiple different fault scenarios, improving the coverage of storage system fault scenario testing.

[0087] In view of the above objectives, a third aspect of the present invention provides a computer device. Figure 3 The diagram shown is a schematic representation of an embodiment of the computer device provided by the present invention. Figure 3 As shown, the computer device of this embodiment includes the following apparatus: at least one processor 021; and a memory 022, the memory 022 storing computer instructions 023 executable on the processor. When the instructions are executed by the processor, the steps of the method include: mounting a Ceph storage system directory or volume on a test stress machine; calling the storage test tool vdbench to run a storage service test script to simulate service pressure; setting relevant parameters for fault simulation in a configuration file and parsing the configuration file; simulating corresponding fault test scenarios based on the parsing results of the configuration file; monitoring the output logs of the storage service test script and recording current key information when an anomaly is detected; and reading the current key information for analysis and outputting a robustness test summary result.

[0088] The present invention also provides a computer-readable storage medium. Figure 4 The diagram shown is a schematic representation of an embodiment of the computer-readable storage medium provided by the present invention. Figure 4 As shown, computer-readable storage medium 031 stores a computer program 032 that, when executed by a processor, performs the methods described above.

[0089] Finally, it should be noted that those skilled in the art will understand that all or part of the processes in the above embodiments can be implemented by a computer program instructing related hardware. The program for the server-centralized testing method can be stored in a computer-readable storage medium. When executed, the program can include the processes of the embodiments of the above methods. The storage medium for the program can be a magnetic disk, optical disk, read-only memory (ROM), or random access memory (RAM), etc. The above computer program embodiments can achieve the same or similar effects as any of the corresponding foregoing method embodiments.

[0090] Furthermore, the method disclosed in the embodiments of the present invention can also be implemented as a computer program executed by a processor, which may be stored in a computer-readable storage medium. When the computer program is executed by the processor, it performs the functions defined in the method disclosed in the embodiments of the present invention.

[0091] Furthermore, the above-described method steps and system units can also be implemented using a controller and a computer-readable storage medium for storing a computer program that enables the controller to perform the functions of the above-described steps or units.

[0092] Those skilled in the art will also understand that the various exemplary logic blocks, modules, circuits, and algorithm steps described in conjunction with the disclosure herein can be implemented as electronic hardware, computer software, or a combination of both. To clearly illustrate this interchangeability between hardware and software, the functionality of various illustrative components, blocks, modules, circuits, and steps has been generally described. Whether this functionality is implemented as software or as hardware depends on the specific application and the design constraints imposed on the system as a whole. Those skilled in the art can implement the functionality in various ways for each specific application, but such implementation decisions should not be construed as departing from the scope of the embodiments disclosed herein.

[0093] In one or more exemplary designs, functionality may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functionality may be stored as one or more instructions or code on or transmitted via a computer-readable medium. Computer-readable media include computer storage media and communication media, including any medium that facilitates the transfer of a computer program from one location to another. Storage media may be any available medium accessible to a general-purpose or special-purpose computer. By way of example, and not limitation, computer-readable media may include RAM, ROM, EEPROM, CD-ROM or other optical disc storage devices, disk storage devices or other magnetic storage devices, or any other medium that may be used to carry or store the required program code in the form of instructions or data structures and is accessible to a general-purpose or special-purpose computer or a general-purpose or special-purpose processor. Furthermore, any connection may be appropriately referred to as computer-readable media. For example, if software is transmitted from a website, server, or other remote source using coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave, then the aforementioned coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are all included in the definition of media. As used herein, disks and optical discs include compact discs (CDs), laser discs, optical discs, digital versatile discs (DVDs), floppy disks, and Blu-ray discs, where disks typically reproduce data magnetically, while optical discs reproduce data optically using lasers. Combinations of the above should also be included within the scope of computer-readable media.

[0094] The above are exemplary embodiments disclosed in this invention. However, it should be noted that various changes and modifications can be made without departing from the scope of the embodiments of this invention as defined by the claims. The functions, steps, and / or actions of the methods according to the disclosed embodiments described herein do not need to be performed in any particular order. Furthermore, although the elements disclosed in the embodiments of this invention may be described or claimed individually, they may be understood as multiple unless explicitly limited to a singular number.

[0095] It should be understood that, as used herein, the singular form “a” is intended to include the plural form as well, unless the context clearly supports an exception. It should also be understood that, as used herein, “and / or” refers to any and all possible combinations of one or more of the associated listed items.

[0096] The embodiment numbers disclosed in the above embodiments of the present invention are for descriptive purposes only and do not represent the superiority or inferiority of the embodiments.

[0097] Those skilled in the art will understand that all or part of the steps of the above embodiments can be implemented by hardware or by a program instructing related hardware. The program can be stored in a computer-readable storage medium, such as a read-only memory, a disk, or an optical disk.

[0098] Those skilled in the art should understand that the discussion of any of the above embodiments is merely exemplary and is not intended to imply that the scope of the invention (including the claims) is limited to these examples. Within the framework of the invention, technical features of the above embodiments or different embodiments can be combined, and many other variations of different aspects of the invention exist, which are not provided in the details for the sake of brevity. Therefore, any omissions, modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the invention should be included within the protection scope of the invention.

Claims

1. A robustness testing method for a Ceph-based storage system, characterized in that, include: Mount the Ceph storage system directory or volume on the test stress machine, and call the storage testing tool vdbench to run the storage business test script to simulate business stress. Set the relevant parameters for fault simulation in the configuration file and parse the configuration file; Simulate the corresponding fault test scenario based on the parsing results of the configuration file; Monitor the output logs of the storage service test script and record key information when an anomaly is detected. as well as The system reads and analyzes the current key information and outputs a summary of robustness test results. The step of setting relevant parameters for fault simulation in the configuration file includes: describing configuration items in the form of key-value pairs in the configuration file, wherein the configuration items include at least fault simulation time, fault simulation interval, fault trigger mode, and fault weight; the fault weight represents its proportion in the total number of fault simulations.

2. The robustness testing method for a Ceph-based storage system according to claim 1, characterized in that, The step of mounting a Ceph storage system directory or volume on the test stress machine and calling the storage testing tool vdbench to run storage service test scripts to simulate service stress includes: vdbench was used as the testing tool for storage business test scripts to perform continuous IO read and write tests on the mounted Ceph storage system directory or volume. The business pressure IO model was 4K 100% random read and write, with a read ratio of 30%, and the running time was set to an integer multiple of 24 hours.

3. The robustness testing method for a Ceph-based storage system according to claim 1, characterized in that, The fault triggering modes include a single fault mode and a mixed fault mode, wherein the mixed fault mode supports the simultaneous triggering of two different fault scenarios.

4. The robustness testing method for a Ceph-based storage system according to claim 3, characterized in that, The fault scenarios include at least: disk sub-health, disk plugging / unplugging, network sub-health, node memory leak, and node restart power failure.

5. The robustness testing method for a Ceph-based storage system according to claim 4, characterized in that, The simulation of corresponding fault test scenarios based on the parsing results of the configuration file includes: After the configuration file is parsed, the corresponding fault scenario is simulated and triggered at a set time, wherein the set time is determined based on the total running time and the interval time.

6. The robustness testing method for a Ceph-based storage system according to claim 1, characterized in that, The monitoring of the storage service test script's runtime output logs, and the recording of key information when an anomaly is detected, includes: During the test run, on the test stress machine side, vdbench can continuously output logs in line-by-line format, wherein the logs in line-by-line format include at least the current IOPS, bandwidth, and time. The storage system continuously monitors the currently output log information; Record key information when vdbench terminates abnormally or the current input IOPS is 0.

7. The robustness testing method for a Ceph-based storage system according to claim 6, characterized in that, The current key information includes: current time, current vdbench running status IOPS size, current running fault scenario, and the health status of the ceph storage system.

8. The robustness testing method for a Ceph-based storage system according to claim 1, characterized in that, The process of reading and analyzing the current key information and outputting the robustness test summary results also includes: Read the current key information and determine the test result based on the current key information; After processing all current key information, output the test summary results.

9. A robustness testing device for a Ceph-based storage system, characterized in that, include: The business stress unit is configured to mount a Ceph storage system directory or volume and run storage business test scripts to simulate business stress. The configuration analysis unit is configured to set relevant parameters for fault simulation operation in the configuration file and parse the configuration file; The fault simulation unit is configured to simulate corresponding fault test scenarios based on the parsing results of the configuration file. The system monitoring unit is configured to monitor the output logs of the storage service test scripts and record current key information when an anomaly is detected. as well as The log recording unit is configured to read the current key information for analysis and output a summary of robustness test results. The configuration analysis unit is also configured to: describe configuration items in the configuration file in the form of key-value pairs, wherein the configuration items include at least fault simulation time, fault simulation interval, fault triggering mode and fault weight; The weight of a fault represents its proportion in the total number of fault simulations.

10. A computer device, characterized in that, include: At least one processor; as well as A memory storing computer instructions executable on the processor, which, when executed by the processor, implement the steps of the method according to any one of claims 1-8.

11. A computer-readable storage medium storing a computer program, characterized in that, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 1-8.