A slow disk simulation method and system based on data enhancement

By constructing a feature matrix model and generating enhanced time-series data using a dynamic time warping algorithm, the problem of insufficient realism in existing slow disk simulation methods is solved. This achieves highly realistic simulation of slow disk behavior, reduces the risk of false positives and false negatives, and improves the testing efficiency and accuracy of storage systems.

CN122242214APending Publication Date: 2026-06-19HUARUI INDEX CLOUD TECH (SHENZHEN) CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
HUARUI INDEX CLOUD TECH (SHENZHEN) CO LTD
Filing Date
2026-03-06
Publication Date
2026-06-19

Smart Images

  • Figure CN122242214A_ABST
    Figure CN122242214A_ABST
Patent Text Reader

Abstract

This invention relates to a slow disk simulation method and system based on data augmentation, belonging to the field of data processing technology. It solves the problems of insufficient realism, simplistic behavioral patterns, and inability to accurately reflect the complex characteristics of real slow disks in existing slow disk simulation methods. The method includes: calculating the correlation coefficients between key indicators based on multiple key indicators of the slow disk by collecting corresponding time-series data, and constructing a feature matrix model; performing data augmentation on the time-series data of each key indicator based on a dynamic time warping algorithm to generate augmented time-series data; validating the augmented time-series data based on the feature matrix model, and storing it in the augmented time-series dataset if the verification passes; matching the real-time time-series data of each key indicator of the target disk with the augmented time-series dataset, and injecting delays into the I / O requests of the target disk based on the matched augmented time-series data when a match is successful, to simulate slow disk behavior. This achieves slow disk simulation covering various real-world faults.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of data processing technology, and in particular to a slow disk simulation method and system based on data augmentation. Background Technology

[0002] Slow disks are a typical type of sub-optimal hardware failure in storage systems, especially in distributed storage systems, where a single disk's performance degradation can lead to a sharp decline in the performance of the entire storage pool. Slow disks have the following typical characteristics: First, they are difficult to detect, as their health status cannot be determined by a single indicator and requires comprehensive evaluation based on multiple indicators and business pressure; second, they are difficult to construct, as naturally occurring slow disks require long-term use to appear, and their behavior patterns are complex and lack fixed rules, making it difficult to effectively test slow disk detection algorithms.

[0003] Currently, methods for artificially creating slow disks mainly include applying additional I / O (Input / Output) pressure and stubbing delay. Applying additional I / O pressure involves adding extra I / O load to a specific hard drive based on actual business workload, adjusting the pressure to make it behave like a slow disk. This method is primarily used to simulate read request latency and cannot precisely control the degree and pattern of slow disk behavior. Stubbing delay involves intercepting I / O requests within the operating system kernel module or storage system software and artificially adding a fixed or proportional delay, thus making the hard drive behave like a slow disk. While this method allows control over the amount of latency, the generated slow disk behavior differs significantly from the complex characteristics of a truly slow disk.

[0004] The method of applying additional I / O pressure can only simulate the latency of read requests to a limited extent, and cannot cover write requests and mixed load scenarios. Furthermore, the control of slow disk behavior is not precise enough and lacks flexibility. Although the staking delay method can adjust the latency parameters, the latency patterns it generates are too regular and artificial, and cannot truly reflect the complex behavior of slow disks in terms of multi-index coupling, nonlinear changes, and local faults. As a result, detection algorithms based on such simulated data are prone to false detections and false negatives in real-world environments. Summary of the Invention

[0005] Based on the above analysis, the embodiments of the present invention aim to provide a slow disk simulation method and system based on data augmentation, in order to solve the problems of insufficient realism, single behavior pattern, and inability to accurately reflect the complex characteristics of real slow disks in existing slow disk simulation methods.

[0006] On one hand, embodiments of the present invention provide a slow disk simulation method based on data augmentation, comprising the following steps: Based on multiple key indicators of slow disks, the correlation coefficients between key indicators are calculated by collecting corresponding time-series data, and a feature matrix model is constructed. Based on the dynamic time warping algorithm, data augmentation is performed on the time series data of each key indicator to generate augmented time series data; the augmented time series data is verified based on the feature matrix model, and if the verification passes, it is stored in the augmented time series dataset. The real-time time-series data of key metrics of the target disk are matched with the enhanced time-series dataset. When a match is successful, delay is injected into the I / O requests of the target disk based on the matched enhanced time-series data to simulate slow disk behavior.

[0007] Based on further improvements to the above methods, data augmentation is performed on the time-series data of each key indicator to generate augmented time-series data, including: Set variation rules for each key indicator under each business model; Based on the same business model, each data augmentation operation yields augmented time-series data for a scenario group under that business model, where the following operations are performed: The time series data of each key indicator is used as the original sequence. The original sequence is then transformed according to the corresponding transformation rules to obtain the target sequence. Based on the dynamic time warping algorithm, the optimal path between each original sequence and the corresponding target sequence is calculated. The original sequence is then warped and adjusted according to the optimal path to generate the enhanced time series data corresponding to the original sequence.

[0008] Based on further improvements to the above method, at least one variation rule is set for each key indicator under each business model. The variation rules include: noise injection, velocity variation, and outlier insertion; the velocity variation includes: time axis stretching and time axis compression.

[0009] Based on further improvements to the above method, the original sequence is distorted and adjusted according to the optimal path to generate enhanced time-series data corresponding to the original sequence, including: Sort the point pairs in the optimal path by their original sequence indices and exclude the initial matching point pairs; Calculate the absolute difference of the target sequence index between every two adjacent pairs of points after sorting; if the absolute difference is greater than 1, determine a sub-sequence interval to be adjusted based on the original sequence index in the two pairs of points; merge the sub-sequence intervals with consecutive original sequence indices to obtain the interval to be distorted. Based on the same deformation rules used to generate the target sequence, the region to be distorted is adjusted to generate enhanced time series data.

[0010] Further improvements to the above method involve validating the enhanced time-series data based on a feature matrix model, including: Calculate the correlation coefficient matrix of the enhanced time series data of each key indicator for each scenario group under the same business model, and use it as the enhanced feature matrix; Calculate the mean absolute error and Spearman rank correlation coefficient of the enhanced feature matrix and the corresponding submatrix in the feature matrix model, respectively. If the mean absolute error of all submatrices is less than or equal to the first threshold, and the Spearman rank correlation coefficient is greater than or equal to the second threshold, then the verification passes; otherwise, the verification fails.

[0011] Based on further improvements to the above method, the key indicators include: key indicators of slow disk behavior and key indicators of slow disk causes; the feature matrix model is a symmetric matrix constructed by calculating the correlation coefficients between key indicators of slow disk behavior, between key indicators of slow disk behavior and key indicators of slow disk causes, and between key indicators of slow disk causes.

[0012] Based on further improvements to the above method, the real-time time-series data of various key indicators of the target disk are matched with the enhanced time-series dataset, including: Based on the feature matrix model and the enhanced time series data of all scenario groups under the specified business model, real-time time series data of each key indicator in the current time window are collected according to the preset time window. The statistical characteristics of each key indicator are calculated on the current real-time time series data and the enhanced time series data of each scenario group, and the similarity is compared. The enhanced time series data of the scenario group with the highest similarity is taken as the enhanced candidate data. The correlation coefficients between key indicators are calculated based on the current real-time time series data. A real-time correlation matrix is ​​constructed and compared with the feature matrix model. If the comparison result meets the preset conditions, the matching is successful and the enhanced candidate data is the matched enhanced time series data. Otherwise, the real-time time series data of each key indicator in the next time window is taken and matched again with the enhanced time series data of each scenario group until the matching is successful.

[0013] A further improvement to the above method involves injecting latency into the I / O requests of the target disk based on matched enhanced timing data, including: When a match is successful, a simulation session is started, and the start timestamp is obtained; the current timestamp is obtained based on the time when the I / O request to the target disk was intercepted. Based on the current timestamp and the start timestamp, calculate the simulated elapsed time and map the simulated elapsed time onto the time axis of the average service duration sequence in the matched enhanced time series data. The target service duration is then obtained through interpolation. The delay time is obtained by comparing the target service time with the baseline service time of a normal disk; a delay time is applied to I / O requests to simulate the behavior of a slow disk.

[0014] Based on the further improvement of the above method, the simulation session is configured with a simulation duration; within the simulation duration, real-time time series data of each key indicator are periodically collected according to a preset time window, a real-time correlation matrix is ​​constructed and compared with the feature matrix model, and if the real-time correlation matrix and the feature matrix model no longer match for several consecutive periods, the simulation is terminated in advance.

[0015] On the other hand, embodiments of the present invention provide a slow disk simulation system based on data augmentation, comprising: The model building module is used to construct a feature matrix model by collecting corresponding time-series data to calculate the correlation coefficients between multiple key indicators of slow disks. The data augmentation module is used to augment the time series data of various key indicators based on the dynamic time warping algorithm, generating augmented time series data; the augmented time series data is verified based on the feature matrix model, and if the verification passes, it is stored in the augmented time series dataset; The slow disk simulation module is used to match the real-time time-series data of various key indicators of the target disk with the enhanced time-series dataset. When a match is successful, it injects latency into the I / O requests of the target disk based on the matched enhanced time-series data to simulate slow disk behavior.

[0016] Compared with the prior art, the present invention can achieve at least one of the following beneficial effects: 1. Data augmentation technology is applied to disk failure simulation. High-fidelity augmented data is generated through dynamic time warping algorithm and rigorously verified by combining multi-index feature matrix model, so that the simulated slow disk behavior is highly consistent with the real failure in terms of statistical characteristics and intrinsic correlation.

[0017] 2. By constructing a unified feature matrix model across business models, the essential characteristics of slow disks are characterized from the perspective of multi-indicator coupling relationships. Based on this model, massive enhanced scenario data covering different fault modes and severity are generated for verification, providing a comprehensive and rigorous testing environment for the detection algorithm and fault tolerance mechanism of the storage system, effectively reducing the risk of false detection and false negative detection of slow disks.

[0018] 3. During simulation, a dual matching method of statistical feature matching and correlation matrix verification is used to accurately obtain the matched enhanced time-series data as a template sequence. For specific business scenarios, dynamic delays that conform to physical laws are actively and stably injected, realizing the on-demand, repeatable, and highly realistic simulation of slow disk behavior under the condition of no hardware damage. Moreover, the entire simulation process is controllable and observable, and has clear data and model basis, which greatly improves the efficiency and accuracy of storage system reliability testing.

[0019] In this invention, the above-described technical solutions can be combined with each other to achieve more preferred combinations. Other features and advantages of this invention will be set forth in the following description, and some advantages may become apparent from the description or be learned by practicing the invention. The objects and other advantages of this invention can be realized and obtained from what is particularly pointed out in the description and drawings. Attached Figure Description

[0020] The accompanying drawings are for illustrative purposes only and are not intended to limit the invention. Throughout the drawings, the same reference numerals denote the same parts. Figure 1 This is a flowchart of a slow disk simulation method based on data augmentation in Embodiment 1 of the present invention. Detailed Implementation

[0021] Preferred embodiments of the present invention will now be described in detail with reference to the accompanying drawings, which form part of this application and are used together with the embodiments of the present invention to illustrate the principles of the present invention, but are not intended to limit the scope of the present invention.

[0022] Example 1 A specific embodiment of the present invention discloses a slow disk simulation method based on data augmentation, such as... Figure 1 As shown, it includes steps S1-S3.

[0023] S1. Based on multiple key indicators of slow disks, the correlation coefficients between key indicators are calculated by collecting corresponding time-series data, and a feature matrix model is constructed.

[0024] This step is used to extract the key indicators that best characterize the state of slow disks from the comprehensive and high-quality raw data collected, and to construct a feature matrix model by calculating the correlation coefficients between the key indicators. This is further subdivided into steps S11-S13.

[0025] S11. Collect raw data and perform preprocessing.

[0026] Different business modes are assigned to the storage system, applying varying levels of pressure. During this period, data acquisition agents deployed on each server of the storage system collect all monitoring metrics data for slow disks under each business mode. These business modes include, but are not limited to: 4K random write, 4K random read, 1M sequential write, 1M sequential read, and 4K mixed read / write (where reads account for 70% and writes account for 30%).

[0027] Monitoring metrics fall into two categories: slow disk behavior metrics and slow disk cause metrics. Slow disk behavior metrics are more real-time, reflecting the load and performance of the hard drive during I / O request processing. These include: IOPS (Input / Output Operations Per Second), bandwidth, average request size, average queue length, average service duration, and average utilization. Slow disk cause metrics are less real-time, reflecting hard drive wear and tear. These include: amount of data written, amount of data read, data read error rate, number of remapped sectors, parity error rate, write error rate, cumulative power-on time, instruction timeout count, and number of uncorrectable errors.

[0028] The raw data is stored in a time-series database in units of hard drives, serialized by timestamps to form a structured time-series dataset, and labeled with the business model to which the time-series data belongs. Subsequently, data cleaning processes are performed to repair or remove missing or abnormal data to ensure data quality.

[0029] S12. Extract key indicators using principal component analysis.

[0030] Because the collected raw monitoring indicators are high-dimensional and redundant, directly using them for modeling is inefficient and noisy. Therefore, this step uses Principal Component Analysis (PCA) for dimensionality reduction and screening, including the following steps: Standardize the data of each monitoring indicator to a mean of 0 and a standard deviation of 1. Construct the covariance matrix, perform eigenvalue decomposition, and obtain eigenvalues ​​and eigenvectors; Based on a threshold for the cumulative variance contribution rate, such as 85%, select the top k principal components; Using the variance contribution rate of each principal component as the weight, the weighted sum of the squared loads of each monitoring indicator on the selected k principal components is calculated as the total contribution of the monitoring indicator. The overall contribution of each monitoring indicator is normalized and mapped to a preset weight range (e.g., 0.1 to 1.0) to obtain the weight value of each monitoring indicator; the larger the weight value, the greater the contribution of the monitoring indicator to explaining the changes in the original data. Monitoring metrics with weight values ​​greater than a preset weight threshold (e.g., 0.7) are designated as key metrics.

[0031] By extracting key indicators, not only is computational efficiency improved, but the set of feature indicators most relevant to the "slow disk" state is also objectively selected, overcoming the subjectivity and limitations of human feature selection.

[0032] S13. Calculate the correlation coefficients between key indicators and construct a feature matrix model.

[0033] For each pair of key indicators, time-series data collected under different business models are concatenated according to their timestamps to form a longer, continuous time-series dataset covering various business models. The Pearson correlation coefficient is calculated within the same time window, with a value ranging from -1 to 1. A value greater than 0 indicates a positive correlation between the indicators, less than 0 indicates a negative correlation, and 0 indicates no correlation. By setting a correlation threshold (e.g., 0.75), a correlation coefficient greater than or equal to the threshold is considered to indicate a strong correlation between the pair of key indicators.

[0034] Furthermore, based on the correlation coefficients between all key indicators, a symmetric correlation matrix is ​​formed. The rows and columns in the matrix are the key indicators, and the element values ​​in the matrix are the Pearson correlation coefficients between the corresponding two key indicators. This matrix is ​​the feature matrix model of the slow disk.

[0035] Based on the category to which the key indicators belong, the feature matrix model is divided into three sub-matrices: sub-matrices between key indicators of slow disk behavior, sub-matrices between key indicators of slow disk behavior and key indicators of causes of slow disk, and sub-matrices between key indicators of causes of slow disk.

[0036] For example, the extracted key indicators of slow disk behavior include: IOPS, average request size, and average service duration; the extracted key indicators of slow disk causes include: data read error rate, number of remapped sectors, and write error rate; the feature matrix model constructed based on the extracted key indicators is a 6×6 symmetric matrix with diagonal elements having a value of 1, and each sub-matrix is ​​a 3×3 matrix, corresponding to the upper left corner region, the upper right / lower left corner region, and the lower right corner region, respectively.

[0037] Furthermore, to ensure the universality and accuracy of the constructed feature matrix model, the feature matrix model is validated.

[0038] Specifically, the feature matrix model is validated, including: Collect new time series data according to the method in step S11, directly use the key indicators extracted in step S12, collect the time series data of the corresponding indicators from the new time series data, and construct a feature matrix model as a verification matrix model. The validation matrix model and the feature matrix model are compared, and the mean absolute error (MAE) and Spearman rank correlation coefficient are calculated as the comparison results. If the comparison results meet the preset conditions, the validation passes; otherwise, the validation fails.

[0039] Specifically, the off-diagonal regions of the upper or lower triangular matrices in the verification matrix model and feature matrix model are used as the comparison regions. The mean absolute error and Spearman rank correlation coefficient of the comparison regions are calculated. If the mean absolute error is less than or equal to a preset first threshold, such as 0.1, and the Spearman rank correlation coefficient is greater than or equal to a preset second threshold, such as 0.8, then the comparison results meet the preset conditions and the verification passes; otherwise, the verification fails.

[0040] It should be noted that, due to the symmetry of the feature matrix model, the region to be compared contains all independent correlation coefficient information. The above adopts a holistic comparison strategy to ensure the accuracy of model verification.

[0041] The feature matrix model constructed in this step elevates the judgment of the "sub-healthy" state of slow disks from a single indicator threshold to a characterization of the coupling relationship of multiple indicators. It is a unified slow disk feature matrix model across business models, which can not only describe "a certain indicator is high", but also reveal "which indicators change together and how they change", thereby accurately capturing the complexity and non-linear characteristics of real slow disk behavior.

[0042] S2. Based on the dynamic time warping algorithm, data augmentation is performed on the time series data of each key indicator to generate augmented time series data; the augmented time series data is verified based on the feature matrix model, and if the verification passes, it is stored in the augmented time series dataset.

[0043] This step, while adhering to the constraints of the feature matrix model, utilizes the Dynamic Time Warping (DTW) algorithm to reasonably transform the limited original key indicator sequences, thereby generating an enhanced dataset that is highly consistent with real slow disks in terms of statistical characteristics and correlations, while also covering more anomaly patterns and load scenarios. This step is further divided into steps S21-S23.

[0044] S21. Set variation rules for each key indicator under each business model.

[0045] Because different business models (small / large blocks, random / sequential, read / write) put different pressures on the disk, the resulting "slow" characteristics and causes will also have systematic differences in performance indicators. Therefore, the transformation of data for each key indicator should conform to the characteristics of the respective business model.

[0046] For each key indicator, at least one transformation rule is set for each business model. The transformation rules include: noise injection, velocity variation, and outlier insertion. Noise injection is used to add Gaussian noise to each data point to simulate random fluctuations. Velocity variation is used to simulate continuous changes over a certain time interval, including: time axis stretching (deceleration) and time axis compression (acceleration). Outlier insertion is used to simulate sudden severe anomalies.

[0047] It should be noted that the above three variation rules can be used individually or in any combination, resulting in a total of 7 modes. The appropriate rule combination should be selected to guide the enhancement process based on the business mode to be simulated. For example, combining the variation rules of "speed change and outlier insertion" can simulate continuous access to the faulty region (time stretching) while simultaneously overlaying a sudden, severe delay event.

[0048] Furthermore, deformation parameters are configured for each deformation rule. For example, the deformation rule for noise injection is configured with a Gaussian standard deviation, the deformation rule for velocity variation is configured with a velocity variation ratio, and the deformation rule for outlier insertion is configured with a deviation, insertion interval time, and insertion number.

[0049] S22. Based on the dynamic time warping algorithm, data augmentation is performed on the time series data of each key indicator to generate enhanced time series data.

[0050] Specifically, based on the same business model, each data augmentation operation yields augmented time-series data for a scenario group under that business model, which involves the following operations: The time series data of each key indicator is used as the original sequence. According to the transformation rules corresponding to each key indicator under this business model, each original sequence is transformed to obtain the target sequence. Based on the dynamic time warping algorithm, the optimal path between each original sequence and the corresponding target sequence is calculated. The original sequence is then warped and adjusted according to the optimal path to generate the enhanced time series data corresponding to the original sequence.

[0051] It should be noted that this step does not directly use the target sequence obtained through deformation processing as the enhanced time-series data, because its fixed pattern may not reflect the randomness and locality of real faults, and could easily lead to a break in the correlation between multidimensional indicators. Instead, the optimal nonlinear time mapping between the original sequence and the target sequence is obtained using the Dynamic Time Warping (DTW) algorithm, which serves as the optimal path. Based on the optimal path, the region to be distorted in the original sequence is obtained, and after distortion adjustment, the enhanced time-series data is obtained.

[0052] Specifically, the original sequence is warped and adjusted according to the optimal path to generate enhanced time-series data corresponding to the original sequence, including: ① Sort the point pairs in the optimal path according to the original sequence index and exclude the initial matching point pairs.

[0053] It should be noted that the optimal path Midpoint Pair Represents the first position in the original sequence. The nth point and the target sequence Align the points and index them according to the original sequence. The values ​​are sorted from smallest to largest to obtain an ordered path.

[0054] Furthermore, it is determined whether the original sequence index of the first point pair in the ordered path is 0. If it is 0, then the first point pair is the starting matching point pair and is removed.

[0055] For example, the optimal path is The original sequence indices 0, 3, and 9 have been sorted in ascending order, and the first point pair... The initial matching point pair, after removal, yields the optimal path as follows: .

[0056] ② Calculate the absolute difference of the target sequence index between each pair of adjacent points after sorting. If the absolute difference is greater than 1, determine a subsequence interval to be adjusted based on the original sequence index in the two point pairs. Merge the subsequence intervals with consecutive original sequence indices to obtain the interval to be distorted.

[0057] Pair of two adjacent points and absolute difference of target sequence index ;if This indicates that between these two matching points, the state of the target sequence corresponding to the original sequence has undergone a significant change exceeding one step. Therefore, the state of the target sequence between these two matching points in the original sequence... and The data points between these points constitute a subsequence interval to be adjusted, which is designed to simulate the data points in the target sequence from... arrive This section exhibits abnormal behavior. If... If the change in the state of the target sequence is gradual or insignificant, the corresponding original sequence interval is not considered as the subsequence interval to be adjusted.

[0058] Candidate subsequence intervals that are consecutive or overlapping at the original sequence index will be merged to form the final interval to be distorted. Each merged interval corresponds to a consecutive data segment in the target sequence, which is obtained by the distortion rules used when generating the target sequence.

[0059] For example, the optimal path The absolute difference between the indexes of the target sequences is 3, and the extraction... As the region to be distorted, according to the target sequence The rules governing deformation between them are distorted.

[0060] ③ Based on the same deformation rules used when generating the target sequence, the region to be distorted is adjusted to generate enhanced time series data.

[0061] For each determined interval to be distorted, the distortion is performed according to the deformation rules used when generating the corresponding target sequence data segment. For example, if the target data segment is generated through "time stretching and outlier insertion", then the same time stretching and outlier insertion operations are applied to the interval of the original sequence to make the data shape of the interval approximate the target's outlier pattern.

[0062] Preferably, in order to ensure that the generated enhanced time series data not only conforms to the deformation rules but also retains the key features of the original sequence, the rationality of the enhanced time series data generated for each key indicator is verified. If the rationality verification fails, the deformation parameters corresponding to the deformation rules are adjusted, the interval to be distorted is readjusted, and new enhanced time series data is generated until the verification is passed.

[0063] Reasonableness verification includes: statistical feature comparison, trend correlation verification, and key event alignment verification. One or more methods can be selected for reasonableness verification based on the actual situation.

[0064] Specifically, statistical feature comparison involves calculating whether the mean and variance of the augmented time series data and the original sequence are within a preset deviation range.

[0065] Trend correlation verification involves calculating whether the Pearson correlation coefficient between the enhanced time series data and the original sequence is greater than or equal to a preset correlation threshold.

[0066] Key event alignment verification involves identifying whether the differences in the peak and trough positions between the time-aligned augmented data sequence and the original sequence are within a preset error range. Specifically, if the augmented time series data uses a velocity variation deformation rule, the time axis needs to be scaled proportionally to align the augmented time series data with the original sequence before comparing the differences in peak and trough positions.

[0067] S23. Verify the enhanced time series data based on the feature matrix model. If the verification passes, add it to the enhanced time series dataset.

[0068] After obtaining enhanced time-series data of key indicators for each scenario group under the same business model, this step verifies that the correlation between these sequences still conforms to the real slow disk pattern.

[0069] Specifically, the enhanced time-series data is validated based on a feature matrix model, including: Following the method in step S1, calculate the correlation coefficient matrix of enhanced time-series data of all key indicators under the same business model, and use it as the enhanced feature matrix; The enhanced feature matrix and the feature matrix model are compared. The overall comparison strategy in step S1 is adopted to calculate the mean absolute error and Spearman rank correlation coefficient as the comparison results. If the comparison results meet the preset conditions, the verification is passed and the enhanced time series data generated under this business model is valid data and stored in the enhanced time series dataset. Otherwise, the verification fails, the enhanced time series data under this business model is discarded, and new enhanced time series data is regenerated after adjusting the deformation parameters of the deformation rules.

[0070] To improve the verification speed, a block-based progressive comparison strategy can be adopted based on the enhanced feature matrix and the three types of sub-matrices of the feature matrix model.

[0071] Specifically, the enhanced feature matrix and the sub-matrices between key indicators of slow disk behavior, between key indicators of slow disk behavior and key indicators of causes of slow disk, and between key indicators of causes of slow disk are compared in turn. The enhanced feature matrix is ​​considered to have passed the verification only if the comparison results of all three sub-matrices meet the preset conditions. Otherwise, if the comparison result of any sub-matrices does not meet the preset conditions, the verification fails and the comparison is not continued.

[0072] It should be noted that the block-based progressive comparison strategy utilizes the causal logic of slow disk failures, namely: abnormal behavior comes first and is the core of the verification; it can quickly improve the verification speed and locate illogical augmented data.

[0073] During implementation, the overall comparison strategy or the segmented progressive comparison strategy should be selected based on the actual requirements for verification speed and detail.

[0074] Each time steps S22 and S23 are performed, validated enhanced time-series data for each key indicator under the same business model are generated and used as enhanced time-series data for the next scenario group under that business model. This process is repeated multiple times for each business model to obtain enhanced time-series data for multiple scenario groups under each business model.

[0075] It should be noted that, in order to facilitate rapid matching during the real-time simulation phase, when storing the enhanced time-series data of each scene group into the enhanced dataset, the statistical features of the time-series data of each key indicator are extracted, normalized, and then concatenated to obtain a fixed-dimensional vector, which serves as the feature signature of that scene group.

[0076] It should be noted that the statistical characteristics include: mean, variance, slope and quantiles. The quantiles include: P50 (median), P95 and P99 quantiles.

[0077] This step, by introducing the reverse application of the dynamic time warping algorithm and a verification mechanism constrained by the feature matrix model, not only generates massive amounts of simulation data, but also fundamentally ensures the consistency of these data with the real slow disk behavior in terms of the deep feature of multi-index coupling relationship, thereby solving the core defect of insufficient realism in traditional simulation methods.

[0078] S3. Match the real-time time-series data of each key indicator of the target disk with the enhanced time-series dataset and feature matrix model. When the match is successful, inject delay into the I / O requests of the target disk according to the matched enhanced time-series data to simulate slow disk behavior.

[0079] This step involves sensing the operating status of the target disk in real time and quickly and accurately matching it with high-fidelity enhanced time-series data. After confirming that it conforms to the characteristics of a slow disk, it actively and stably injects delays that conform to the laws of real faults, thereby achieving a highly realistic simulation of slow disk behavior.

[0080] S31. Configure the simulation task and load the enhanced time series dataset and feature matrix model.

[0081] Before starting the simulation, configure the key parameters of the simulation task, including: specifying the business mode to be simulated and configuring the simulation duration.

[0082] Load the standard feature matrix model obtained in step S1; based on the specified business mode to be simulated, load the enhanced time-series data and feature signatures of all scenario groups under that business mode, as well as the baseline duration of the normal disk. .

[0083] S32. Collect real-time time series data of each key indicator in the current time window according to the preset time window, and perform double matching.

[0084] After the simulation begins, real-time time-series data of key indicators of the target disk are collected in a sliding manner according to the preset time window. The statistical characteristics of each key indicator are calculated to form a real-time feature vector (the same method as the above method for obtaining the feature signature of the scenario group).

[0085] The dual matching process includes statistical feature matching and association matrix verification, and the specific steps are as follows: ① Calculate the statistical characteristics of each key indicator for the current real-time time series data and the enhanced time series data of each scenario group, and compare the similarity. Select the enhanced time series data of the scenario group with the highest similarity as the enhanced candidate data.

[0086] Specifically, the similarity between the current real-time feature vector and the feature signatures of each currently loaded scene group is calculated. The similarity is obtained by calculating the cosine similarity or the reciprocal of the Euclidean distance.

[0087] If the highest similarity is less than the similarity threshold, an alarm will be issued, indicating that the actual load on the current disk does not match the specified business mode; the enhanced time-series data of all scenario groups under the new business mode will be reloaded before simulation is performed.

[0088] ② If the highest similarity score is greater than or equal to the similarity threshold, then an association matrix verification is performed, including: The correlation matrix between key indicators is calculated based on the real-time time series data collected in the current time window, and this matrix serves as the real-time correlation matrix. The real-time correlation matrix is ​​compared with the feature matrix model. The comparison method also uses the mean absolute error and Spearman rank correlation coefficient as the comparison results. If the comparison results meet the preset conditions, the verification is passed, the current disk status is determined to be consistent with the slow disk model, the enhanced candidate data is the matching enhanced time series data, and the simulation session is started; otherwise, the real-time time series data collected in the next time window is double-verified.

[0089] It should be noted that in actual deployment, due to limitations in the monitoring system's capabilities, it may be impossible to collect real-time data for all key indicators. This embodiment supports simulated matching and verification based on a subset of key indicators, including the following steps: When configuring the simulation task in step S31, a selection of key indicators are predefined, which at least cover the behavioral and causal indicators with the highest weight determined in step S1, to ensure that they can still characterize the core features of the slow disk. When performing statistical feature matching, it is only necessary to calculate the statistical features of each indicator in the defined indicator subset. At the same time, the dimensions corresponding to the indicator subset are extracted from the full feature signature of each scenario group, and reassembled into a sub-feature signature. Then, the similarity between the real-time feature vector and the sub-feature signature is calculated.

[0090] During model validation, sub-association matrices are calculated based on the time-series data of the collected subset of indicators. Subsequently, sub-matrix models that completely correspond to the row and column headers of the indicator subsets are extracted from the standard full feature matrix model. Model validation is completed by comparing the real-time calculated sub-association matrices with the sub-matrix models, with the comparison method and threshold standard remaining unchanged.

[0091] S33. After a successful match, intercept the I / O request and inject a delay.

[0092] It should be noted that a successful "double match" of real-time time-series data only means that the system's overall performance at the current moment conforms to the statistical pattern of a slow disk. However, this may be a temporary fluctuation caused by instantaneous high load, resource contention, etc., and the disk hardware itself is not damaged. In order to conduct effective and stable testing, this patterned behavior needs to be continuously and predictably reproduced on a normal disk. Therefore, this embodiment adopts an active intervention approach: based on the matched high-fidelity enhanced time-series data, a calculated delay is precisely injected into each I / O request, forcing the disk response time to behave as a "slow disk" mode.

[0093] Specifically, the start timestamp is obtained based on the start time of the simulated session. .

[0094] The time series data for "average service duration" obtained from the matched enhanced time series data is used as a template sequence, which is represented as follows: The corresponding timestamp sequence is The timestamps are evenly spaced, and the total duration is... .

[0095] Intercept each I / O request that needs to be processed and obtain the current timestamp. And calculate the simulated elapsed time since the start of the simulation session. .

[0096] Furthermore, the simulated elapsed time is mapped onto the timeline of the template sequence using the following formula to obtain the mapped timestamp. : , in, This indicates the modulo operation.

[0097] In the template sequence, find the timestamp that corresponds to the mapped timestamp. The two closest timestamps and ,Right now: .

[0098] like If the time is exactly equal to the timestamp in the template sequence, then the average service duration corresponding to that timestamp in the template sequence is taken as the target service duration. Otherwise, use linear interpolation to calculate the mapped timestamp. The corresponding target service duration is shown in the formula below: , in, and Representing timestamps and The corresponding average service duration.

[0099] Finally, the delay time injected for the currently intercepted I / O request is obtained using the following formula: .

[0100] Within the simulation duration, real-time time-series data of each key indicator are collected periodically (e.g., every 10 seconds) according to a preset time window. A real-time correlation matrix is ​​constructed and compared with the feature matrix model. If the real-time correlation matrix and the feature matrix model no longer match for several consecutive periods, the simulation is terminated early and an alarm is issued. If the verification passes, the simulation continues until the simulation duration is reached.

[0101] During implementation, slow disks can be actively and controllably simulated to facilitate the injection of slow disk failures into distributed storage systems. This allows for testing whether the system can correctly identify and isolate slow disks, trigger data reconstruction or replica migration, and assess the impact on overall service performance. Alternatively, slow disk failures can be controlled and introduced in pre-production or testing environments to simulate real-world failure scenarios, verify the robustness of the entire application system, and identify defects in extreme situations in advance.

[0102] Compared with existing technologies, this embodiment provides a data-augmented slow disk simulation method that applies data augmentation technology to disk failure simulation. It generates high-fidelity augmented data through a dynamic time warping algorithm and performs rigorous verification using a multi-indicator feature matrix model. This ensures that the simulated slow disk behavior is highly consistent with real failures in terms of statistical characteristics and intrinsic correlations. By constructing a unified feature matrix model across business models, the essential characteristics of slow disks are characterized from the perspective of multi-indicator coupling relationships. Based on this model, massive amounts of augmented scenario data covering different failure modes and severity levels are generated, providing a comprehensive and rigorous testing environment for storage system detection algorithms and fault tolerance mechanisms, effectively reducing the risk of false positives and false negatives in slow disk detection. During simulation, a dual matching method of statistical feature matching and correlation matrix verification is used to accurately obtain the matched enhanced time-series data as a template sequence. For specific business scenarios, dynamic delays that conform to physical laws are actively and stably injected, enabling the on-demand, repeatable, and highly realistic simulation of slow disk behavior under the condition of no hardware damage. Moreover, the entire simulation process is controllable and observable, and has clear data and model basis, which greatly improves the efficiency and accuracy of storage system reliability testing.

[0103] Example 2 Another embodiment of the present invention discloses a slow disk simulation system based on data augmentation, thereby implementing the slow disk simulation method based on data augmentation in Embodiment 1. The specific implementation of each module is described in the corresponding description in Embodiment 1. The system includes: The model building module is used to construct a feature matrix model by collecting corresponding time-series data to calculate the correlation coefficients between multiple key indicators of slow disks. The data augmentation module is used to augment the time series data of various key indicators based on the dynamic time warping algorithm, generating augmented time series data; the augmented time series data is verified based on the feature matrix model, and if the verification passes, it is stored in the augmented time series dataset; The slow disk simulation module is used to match the real-time time-series data of various key indicators of the target disk with the enhanced time-series dataset. When a match is successful, it injects latency into the I / O requests of the target disk based on the matched enhanced time-series data to simulate slow disk behavior.

[0104] Since the data-augmented slow disk simulation system of this embodiment and the aforementioned data-augmented slow disk simulation method have similarities and can be mutually referenced, this description is redundant and will not be repeated here. Because this system embodiment shares the same principle as the aforementioned method embodiment, it also possesses the corresponding technical effects of the aforementioned method embodiment.

[0105] Those skilled in the art will understand that all or part of the processes of the methods described in the above embodiments can be implemented by a computer program instructing related hardware, and the program can be stored in a computer-readable storage medium. The computer-readable storage medium may be a disk, optical disk, read-only memory, or random access memory, etc.

[0106] The above description is only a preferred embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any changes or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in the present invention should be included within the scope of protection of the present invention.

Claims

1. A slow disk simulation method based on data enhancement, characterized in that, Includes the following steps: Based on multiple key indicators of slow disks, the correlation coefficients between key indicators are calculated by collecting corresponding time-series data, and a feature matrix model is constructed. Based on the dynamic time warping algorithm, data augmentation is performed on the time series data of each key indicator to generate augmented time series data; The enhanced time series data is validated based on the feature matrix model. If the validation passes, it is stored in the enhanced time series dataset. The real-time time-series data of key indicators of the target disk are matched with the enhanced time-series dataset. When a match is successful, delay is injected into the I / O requests of the target disk according to the matched enhanced time-series data to simulate slow disk behavior.

2. The data augmentation based slow disk simulation method of claim 1, wherein, The process of augmenting the time-series data of each key indicator to generate augmented time-series data includes: Set variation rules for each key indicator under each business model; Based on the same business model, each data augmentation operation yields augmented time-series data for a scenario group under that business model, where the following operations are performed: The time series data of each key indicator is used as the original sequence. The original sequence is then transformed according to the corresponding transformation rules to obtain the target sequence. Based on the dynamic time warping algorithm, the optimal path between each original sequence and the corresponding target sequence is calculated. The original sequence is then warped and adjusted according to the optimal path to generate the enhanced time series data corresponding to the original sequence.

3. The data augmentation based slow disk simulation method of claim 2, wherein, Each key indicator has at least one deformation rule set for each business model. The deformation rules include: noise injection, velocity variation, and outlier insertion. The velocity variation includes: time axis stretching and time axis compression.

4. The data augmentation based slow disk simulation method of claim 2, wherein, The step of distorting and adjusting the original sequence according to the optimal path to generate enhanced time-series data corresponding to the original sequence includes: Sort the point pairs in the optimal path according to the original sequence index and exclude the initial matching point pairs; Calculate the absolute difference of the target sequence index between every two adjacent point pairs after sorting; if the absolute difference is greater than 1, determine a sub-sequence interval to be adjusted based on the original sequence index in the two point pairs; merge the sub-sequence intervals with consecutive original sequence indices to obtain the interval to be distorted; Based on the same deformation rules used to generate the target sequence, the interval to be distorted is adjusted to generate enhanced time-series data.

5. The data augmentation based slow disk simulation method of claim 2, wherein, The enhanced time-series data is validated based on the feature matrix model, including: Calculate the correlation coefficient matrix of the enhanced time series data of each key indicator for each scenario group under the same business model, and use it as the enhanced feature matrix; Calculate the mean absolute error and Spearman rank correlation coefficient of the enhanced feature matrix and the corresponding submatrix in the feature matrix model, respectively. If the mean absolute error of all submatrices is less than or equal to the first threshold, and the Spearman rank correlation coefficient is greater than or equal to the second threshold, then the verification passes; otherwise, the verification fails.

6. The data augmentation based slow disk simulation method of claim 1, wherein, The key indicators include: key indicators of slow disk behavior and key indicators of slow disk causes; the feature matrix model is a symmetric matrix constructed by calculating the correlation coefficients between the key indicators of slow disk behavior, between the key indicators of slow disk behavior and the key indicators of slow disk causes, and between the key indicators of slow disk causes.

7. The data augmentation based slow disk simulation method of claim 1, wherein, The step of matching the real-time time-series data of each key indicator of the target disk with the enhanced time-series dataset includes: Based on the feature matrix model and the enhanced time series data of all scenario groups under the specified business model, real-time time series data of each key indicator in the current time window are collected according to the preset time window. The statistical characteristics of each key indicator are calculated on the current real-time time series data and the enhanced time series data of each scenario group, and the similarity is compared. The enhanced time series data of the scenario group with the highest similarity is taken as the enhanced candidate data. The correlation coefficients between key indicators are calculated based on the current real-time time series data. A real-time correlation matrix is ​​constructed and compared with the feature matrix model. If the comparison result meets the preset conditions, the matching is successful and the enhanced candidate data is the matched enhanced time series data. Otherwise, the real-time time series data of each key indicator in the next time window is taken and matched again with the enhanced time series data of each scenario group until the matching is successful.

8. The data augmentation based slow disk simulation method of claim 1, wherein, The step of injecting latency into the I / O requests of the target disk based on the matched enhanced timing data includes: When a match is successful, a simulation session is started, and the start timestamp is obtained; the current timestamp is obtained based on the time when the I / O request to the target disk was intercepted. Based on the current timestamp and the start timestamp, calculate the simulated elapsed time and map the simulated elapsed time onto the time axis of the average service duration sequence in the matched enhanced time series data. The target service duration is then obtained through interpolation. The delay time is obtained based on the target service duration and the baseline service duration of a normal disk; the delay time is applied to I / O requests to simulate slow disk behavior.

9. The data augmentation based slow disk simulation method of claim 8, wherein, The simulation session is configured with a simulation duration. Within the simulation duration, real-time time-series data of each key indicator are periodically collected according to a preset time window. A real-time correlation matrix is ​​constructed and compared with the feature matrix model. If the real-time correlation matrix and the feature matrix model no longer match for several consecutive periods, the simulation is terminated in advance.

10. A data augmentation based slow disk simulation system, comprising: include: The model building module is used to construct a feature matrix model by collecting corresponding time-series data to calculate the correlation coefficients between multiple key indicators of slow disks. The data augmentation module is used to augment the time series data of various key indicators based on the dynamic time warping algorithm, generating augmented time series data. The enhanced time series data is validated based on the feature matrix model. If the validation passes, it is stored in the enhanced time series dataset. The slow disk simulation module is used to match the real-time time-series data of various key indicators of the target disk with the enhanced time-series dataset, and when the match is successful, to inject delay into the I / O requests of the target disk according to the matched enhanced time-series data to simulate slow disk behavior.