Method for training a machine learning model against storage device failures

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By using window-based weighted data sampling and anomaly detection techniques, the training dataset of the storage device fault prediction model is optimized, solving the problems of low efficiency and low accuracy caused by frequent retraining, and achieving efficient and accurate fault prediction.

CN113377284BActive Publication Date: 2026-06-23SAMSUNG ELECTRONICS CO LTD

View PDF 1 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: SAMSUNG ELECTRONICS CO LTD
Filing Date: 2021-02-02
Publication Date: 2026-06-23

Application Information

Patent Timeline

02 Feb 2021

Application

23 Jun 2026

Publication

CN113377284B

IPC: G06N20/00; G06F18/214; G06F18/2433; G06F18/23213; G06F18/2135; G06F3/06; G06F11/07; G06N5/01; G06N3/0455

AI Tagging

Application Domain

Input/output to record carriers Fault response

Technology Topics

Data set Engineering

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

In existing technologies, storage device fault prediction models require frequent retraining due to dynamic parameter changes, resulting in problems such as large data volume, low training efficiency, and low accuracy.

Method used

A window-based weighted data sampling scheme is adopted, which selects recent data for training by assigning time period windows with different weights, and combines anomaly detection and oversampling techniques to optimize the training dataset and improve model accuracy.

Benefits of technology

It enables efficient management of training data volume, improves the accuracy and frequency of storage device failure prediction models, and reduces the processing cost and time of retraining.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN113377284B_ABST

Patent Text Reader

Abstract

A method for training a machine learning model against storage device failures is disclosed. The method includes segmenting, by a processor, a data set from a database into one or more sub-data sets based on a time period window, assigning, by the processor, one or more weighting values to the one or more sub-data sets according to the time period window of the one or more sub-data sets, respectively, generating, by the processor, a training data set from the one or more sub-data sets according to the one or more weighting values, and training, by the processor, a machine learning model using the training data set.

Need to check novelty before this filing date? Find Prior Art

Description

[0001] This application claims priority and benefit to U.S. Provisional Patent Application No. 62 / 981,348, filed February 25, 2020, entitled “DATA MANAGEMENT, REDUCTION AND SAMPLING SCHEMES FOR ONLINE RE-TRAINING OF SSD FAILURE PREDICTION MODEL,” the entire contents of which are incorporated herein by reference. Technical Field

[0002] One or more aspects of embodiments of this disclosure relate to systems and methods for data management, and more specifically, to systems and methods for data management, reduction, and sampling schemes in response to storage device failures. Background Technology

[0003] To store and retrieve data, data center systems can use a relatively large number of storage devices (such as solid-state drives (SSDs)). Over time, components of a data center system may need to be monitored for performance and functionality, and periodically, SSDs can be replaced in case of storage device failure or anticipated failure. Furthermore, SSDs can be replaced to allow the data center system to continue operating with minimal data loss or service interruption.

[0004] The information disclosed in this background section is only intended to enhance the understanding of the background technology of this disclosure, and therefore may contain information that does not form prior art. Summary of the Invention

[0005] Embodiments of this disclosure relate to systems and methods for data management, reduction, and sampling schemes in response to storage device failures.

[0006] According to some example embodiments of this disclosure, in a method for training a machine learning model, the method includes: a processor segmenting a dataset from a database into one or more datasets based on a time period window; the processor assigning one or more weighting values to the one or more datasets according to the time period window of the one or more datasets; the processor generating a training dataset from the one or more datasets according to the one or more weighting values; and the processor training a machine learning model using the training dataset.

[0007] According to some example embodiments, machine learning models include solid-state drive (SSD) failure prediction models.

[0008] According to some example embodiments, the most recent dataset from the one or more datasets is assigned a first weighting value, and the least recent dataset from the one or more datasets is assigned a second weighting value, wherein the first weighting value is greater than the second weighting value.

[0009] According to some example embodiments, the one or more weighted values decrease by a set amount from a first weighted value to a second weighted value.

[0010] According to some example embodiments, the method further includes: identifying anomalous data in the dataset by a processor; acquiring anomalous data in the dataset by a processor; and adding the anomalous data to the training dataset by a processor.

[0011] According to some example implementations, the abnormal data includes SSD failure data.

[0012] According to some example implementations, a rule-based approach is used to identify anomalous data.

[0013] According to some example implementations, clustering-based methods are used to identify anomalous data.

[0014] According to some example embodiments, the method further includes: generating anomalous data by a processor; and adding the generated anomalous data to a training dataset by a processor.

[0015] According to some example embodiments of this disclosure, a data system includes: a database; a processor, coupled to the database; and a memory, coupled to the processor, wherein the memory stores instructions that, when executed by the processor, cause the processor to: segment a dataset from the database into one or more datasets based on a time period window; assign one or more weights to the one or more datasets according to the time period window of the one or more datasets; generate a training dataset from the one or more datasets according to the one or more weights; and train a machine learning model using the training dataset.

[0016] According to some example embodiments, machine learning models include solid-state drive (SSD) failure prediction models.

[0017] According to some example embodiments, the most recent dataset from the one or more datasets is assigned a first weighting value, and the least recent dataset from the one or more datasets is assigned a second weighting value, wherein the first weighting value is greater than the second weighting value.

[0018] According to some example embodiments, the one or more weighted values are reduced by a certain amount from a first weighted value to a second weighted value.

[0019] According to some example embodiments, the processor is also configured to: identify anomalous data in the dataset; acquire anomalous data in the dataset; and add the anomalous data to the training dataset.

[0020] According to some example implementations, the abnormal data includes SSD failure data.

[0021] According to some example implementations, a rule-based approach is used to identify anomalous data.

[0022] According to some example implementations, clustering-based methods are used to identify anomalous data.

[0023] According to some example embodiments, the processor is also configured to: generate anomalous data; and add the generated anomalous data to the training dataset.

[0024] According to some example embodiments of this disclosure, in a method for training a machine learning model, the method includes: identifying anomalous data in a dataset from a database by a processor; generating anomalous data by a processor; adding the generated anomalous data to the dataset by a processor; identifying a training dataset from the dataset by a processor; obtaining the training dataset from the dataset by a processor; and training a machine learning model using the training dataset by a processor.

[0025] According to some example embodiments, machine learning models include solid-state drive (SSD) failure prediction models. Attached Figure Description

[0026] Figure 1 An overview of a data center system according to a disclosed example embodiment is shown.

[0027] Figure 2 A data center system, according to a disclosed example embodiment, is shown that can be used to enhance a dataset used to train a storage device failure prediction model.

[0028] Figure 3 A diagram illustrating exemplary operations for collecting a dataset for training a storage device failure prediction model using a window-based data sampling method, according to a disclosed example embodiment.

[0029] Figure 4 An overview of example techniques for classifying anomalous data, based on example embodiments, is provided.

[0030] Figure 5 A flowchart illustrating example operations for training a storage device failure prediction model using a modified dataset, according to a disclosed example embodiment, is shown. Detailed Implementation

[0031] Accurate prediction of storage device (e.g., solid-state drive (SSD)) failure is valuable to data center administrators because it allows for more efficient operations through proactive planning and replacement before a failure actually occurs. Machine learning models (e.g., storage device failure prediction models) can be used to predict whether an SSD is likely to fail in the future. However, one of the challenges in developing machine learning models for predicting SSD failures can include incorporating information gathered from the SSDs into the prediction model to achieve higher accuracy in predictions. Relying on static models (including models trained only once) and deploying such models in the data center to predict failures leads to inefficient device failure prediction.

[0032] For example, as an SSD (and its sub-components) ages, its performance characteristics change over time. For instance, as an SSD ages, individual blocks within it become more prone to failure due to increased read and write operations. If a given failure prediction model is trained only once based on a dataset available at training time, future changes in the characteristics of the system's components may not be included in the model, leading to lower efficiency. For example, firmware for individual drives may be updated over time, resulting in different devices operating differently. Therefore, retraining the failure prediction model is useful in capturing such operational changes and their corresponding impact on the SSD's lifespan.

[0033] Furthermore, when additional devices are added to the system over time—regardless of whether the additional devices have the same or different feature sets—it would be useful to include the corresponding log information of the devices in the training dataset.

[0034] Furthermore, SSDs can offer greater robustness in terms of lifespan and functionality. Therefore, failure prediction models can benefit from updates to the training dataset as these technologically advanced components are incorporated into data center systems.

[0035] Furthermore, the workload applied to SSDs in a data center system may change over time, affecting, for example, the rate of degradation of individual SSDs within the system. Therefore, retraining the failure prediction model would be beneficial.

[0036] Given the inefficiencies of the examples above associated with models trained in a single run, aspects of some example embodiments disclosed herein can allow for the capture of workload and device dynamics and thus provide better storage device failure prediction. However, when employing online retraining, at least two challenges arise: 1) how frequently retraining should occur; and 2) the relatively large amount of data to be used for training (retraining). Because the amount of data being collected from SSDs in data centers grows over time, it would be inefficient to retrain the model using all or most of the historically generated data each time or most of the time due to data storage requirements and / or the amount of time that might be spent on retraining.

[0037] Embodiments of this disclosure can allow the system to efficiently manage the amount of data used to train a storage device failure prediction model, while providing relatively high accuracy in predicting SSD failures.

[0038] According to various embodiments, this disclosure relates to systems and methods for optimizing datasets used to retrain storage device failure prediction models. Storage device failure prediction models are machine learning models that can be used to predict whether an SSD is likely to fail in the future. However, training a storage device failure prediction model may require a relatively large amount of data and may involve frequent retraining, for example, due to dynamically changing parameters. In some embodiments, various methods can be used to augment the dataset used to train the storage device failure model. In other embodiments, the amount of training data can be reduced by using a window-based weighted data sampling scheme to select a smaller amount of data based on its recency. In some embodiments, the data that can be used to train the storage device failure prediction model may include instances of storage device failures, anomalies, and outliers (e.g., relevant data points from relevant datasets). In some embodiments, anomalies and outliers may be identified using anomaly detection algorithms, rule-based methods, clustering-based methods, and / or combinations thereof. In some embodiments, the dataset may be imbalanced because it may contain only a small percentage of relevant data points. In some cases, approximately 0.01% to approximately 0.02% of the dataset may include relevant data points. Imbalanced datasets can be improved by generating synthetic instances associated with relevant data points using rule-based or instance-based methods. In some embodiments, relevant data points can be identified and indexed for efficient retrieval. In some embodiments, data processing and preparation can be performed on a nearby storage device, such as a near-end-attached SSD and / or an SSD with an embedded processor.

[0039] Figure 1 An overview of a data center system according to a disclosed example embodiment is shown.

[0040] Reference Figure 1 The data center system 100 may include a database 110 and a machine learning model 120. Data from the database 110 can be used to train the machine learning model 120. The accuracy and reliability of the machine learning model 120 may depend on the quality of the data in the database 110. In one example, the data in the database 110 may be data collected from a storage device. Higher quality datasets can produce more reliable and accurate machine learning models 120. Furthermore, training a machine learning model can utilize relatively large amounts of data and relatively large amounts of processing power.

[0041] In some embodiments, the training of the machine learning model and the processing of the training data may be performed by a processor (such as a central processing unit (CPU)). In other embodiments, the training of the machine learning model and the processing of the data may be offloaded to a nearby storage device (e.g., a storage device within a predetermined physical distance or a storage device that shares virtual resources or categories with a given storage device (or the like)) (such as a nearby SSD or a nearby SSD with an embedded processor).

[0042] In some embodiments, the machine learning model 120 (e.g., a storage device failure prediction model) can be used to predict whether a storage device will fail within a predetermined time window. According to this embodiment, the relevant data used to train the storage device failure prediction model may include anomalous data points (e.g., anomalies or anomalous data). Anomalies may include SSD failure data (e.g., instances of previous SSD failures) and other data with a relatively low probability of occurrence.

[0043] However, storage device failure prediction models may suffer from frequent retraining due to dynamically changing parameters. In some embodiments, storage device failure prediction models may need to be retrained periodically (e.g., weekly or monthly). Some non-limiting examples of non-static parameters can include changes in SSD aging and wear-level characteristics. Furthermore, firmware updates can affect SSD behavior and lifespan. Additionally, technological modifications can be introduced into the device over time, potentially altering parameters not reflected in the initially trained model. Moreover, the device's workload may change over time. As an example, a model may be trained for a specific type of workload, and over a month, the workload may change to be more write-intensive.

[0044] In some cases, retraining a storage device failure prediction model can be challenging. For example, retraining might involve collecting a relatively large amount of data from devices over a relatively long period. In some cases, data could be collected from millions of devices over a period of several years. This can result in data volumes of several gigabytes or terabytes. Furthermore, the dataset may be imbalanced because it may contain only a small percentage of outliers. Outliers can include instances of SSD failures or can be user-defined. In some cases, outliers may account for approximately 0.01% to approximately 0.02% of the entire dataset, which can adversely affect the accuracy of the storage device failure prediction model.

[0045] Figure 2 A data center system, according to a disclosed example embodiment, is shown that can be used to enhance a dataset used to train a storage device failure prediction model.

[0046] Reference Figure 2 Client 210 (e.g., a telemetry client or telemetry agent) can interact with data in database 230 (e.g., a telemetry database) via application protocol interface (API) 220. API 220 can allow the client to extract specific datasets with specific attributes from database 230. For example, client 210 can use API 220 to request data points when the temperature of the storage device is greater than approximately 100 degrees Fahrenheit, and API 220 can return data points with that attribute. In some embodiments, API 220 can also allow the client to store data in the database or offload the computation of the data to a nearby device (e.g., a storage device within a predetermined physical distance or a storage device sharing virtual resources or categories with a given storage device (or the like)) (such as a nearby SSD and a nearby SSD with an embedded processor).

[0047] According to various embodiments, training data for storage device failure prediction models can be augmented using various methods. In some embodiments, a large amount of data can be reduced by selecting a smaller amount of data based on its recentity (e.g., a window-based approach). According to this method, larger weights can be placed on more recent data. Recent data can more accurately represent driver state and workload characteristics. Optimization through a heavy-weight training process can be used to (i) reduce the processing requirements for retraining, thereby reducing associated costs, (ii) reduce the amount of time spent on retraining, and (iii) allow for more frequent retraining, which can in turn lead to higher performance and accuracy of the prediction model.

[0048] Figure 3 A diagram illustrating exemplary operations for collecting a dataset for training a storage device failure prediction model using a window-based data sampling method, according to a disclosed example embodiment.

[0049] Reference Figure 3 Log data can be segmented into multiple time periods (e.g., multiple time period windows) (e.g., multiple "six-month windows"). However, embodiments according to this disclosure are not limited to this, and the duration of the time period can vary depending on the design of the storage device failure prediction model. Window 310 may include the most recent six-month period (in T... n With T n-1 Data recorded between (the two periods). Window 320 may include data recorded during the most recent six-month period (in T). n-1 With T n-2 Data recorded between (T0 and T1). The data can be segmented into multiple "six-month windows," extending up to the start of data collection (T0). For example, window 340 may include data recorded during the first "six-month period" (between T0 and T1). Window 330 may include data recorded during the second "six-month period" (between T1 and T2). According to some example embodiments, the data may be segmented into multiple other time period windows (e.g., multiple "one-month periods"). The time period values may be based on the dataset size and / or the overall time period of the data.

[0050] In some embodiments, data can be weighted based on the window from which it collects data. In some embodiments, window 310 may be assigned the largest weight (e.g., 1) because it contains the most recent data, while window 340 may be assigned the smallest weight (e.g., 0) because it contains the least recent data. As previously mentioned, recent data is more valuable because it more accurately represents driver state and workload characteristics. The weighting of a window can decrease from the window containing the most recent data to the window containing the least recent data. In some embodiments, the weighting of a window can decrease by a predetermined amount sequentially from the window containing the most recent data to the window containing the least recent data. In some embodiments, the weighting can be halved sequentially for each window from the window containing the most recent data to the window containing the least recent data. In other embodiments, the weighting can decrease by one-third sequentially for each window as it moves from the most recent data to the least recent data.

[0051] For example, such as Figure 3 As shown, the data collected from window 310 can be assigned a weighted value W. n The data collected from window 320 can be assigned a weighted value W. n-1 Data collected from window 330 can be assigned a weighted value W1. Data collected from window 340 can be assigned a weighted value W0.

[0052] Various schemes for assigning weighted values can be implemented according to various example embodiments. In some embodiments, the weighted values can be set as follows:

[0053]

[0054] According to this embodiment, the weighting value decreases by half for each window. For example, the most recent six-month period can be assigned a weighting value of 1, indicating that data from the past six months can be included in the storage device failure prediction model 350 (or failure prediction model). The next six-month period can be assigned a weighting value of 1 / 2, indicating that half of the data from that six-month window can be included in the storage device failure prediction model 350. In one example, half of the data from that six-month window can be randomly or sequentially selected as the data for retraining, but the method of selecting the amount of data from the window corresponding to the weighting value of that window as the data for retraining is not limited to this. The following six-month period can be assigned a weighting value of 1 / 4, indicating that one-quarter of the data from that six-month window can be included in the storage device failure prediction model 350. According to this embodiment, the total amount of data used for retraining each time is equal to: 1 + 1 / 2 + 1 / 4 ... + (1 / 2) n As time passes and n increases, the sum of all weights converges to 2, which is equivalent to the data collected over one year (2 × 6 months). Therefore, regardless of the total amount of data collected over the years, the amount of data used for retraining can be less than or equal to the amount of data collected in one year. As an example, twenty years of data can be reduced or sampled to the amount of data collected in one year.

[0055] Furthermore, according to this embodiment, SSD failure data from the database can be collected and retained, regardless of its weighting. The dataset associated with the failure prediction model may be inherently imbalanced and may include a small amount of SSD failure data (e.g., 0.01% to 0.02%). Collecting SSD failure data can improve the accuracy of the storage device failure prediction model.

[0056] The training dataset can also be augmented by preserving relevant data (e.g., by preserving relevant data during dataset pruning). In some embodiments, relevant data points may include anomalies. Anomalies may include not only instances of SSD failures but also other data points with a low probability of occurrence (e.g., outliers). Anomalies (or anomalous data) can be identified using anomaly detection algorithms (such as autoencoder techniques and isolation forest techniques), but are not limited to these. Furthermore, rule-based methods can be used to identify anomalies (or anomalous data). According to rule-based methods, system users (e.g., system administrators) can use... Figure 2 The API provides a public interface for defining rules to determine whether measurements are abnormal. For example, users can use the API to define data points that are abnormal when the storage device's temperature exceeds a given threshold (e.g., 100 degrees Fahrenheit).

[0057] In some examples, the publicly disclosed system may use clustering methods to identify anomalies. For instance, when a database system performs data cleanup operations (e.g., garbage collection) on storage devices, it may retain data classified as anomalous. When the database system deletes data classified as anomalous, it may further subdivide the anomalous data into anomalous clusters based on a clustering algorithm, including but not limited to K-means, principal component analysis (PCA), and / or any other suitable algorithm. Data may be removed or filtered such that at least one data point exists in each anomalous cluster, or the cluster with the highest number of data points is preferred when removing data points.

[0058] In some embodiments, the disclosed system can augment the dataset by oversampling data points with specific characteristics. As previously mentioned, the dataset used to train the storage device failure prediction model may inherently be imbalanced because it may include a limited amount (e.g., approximately 0.01% to approximately 0.02% of the dataset) of SSD failure data and other anomalous data points. This can adversely affect the accuracy of the storage device failure prediction model by ignoring a class of data that includes anomalies, failure instances, or other rare events (e.g., data points associated with the minority class in the machine learning training dataset). Therefore, the disclosed system can generate additional samples (e.g., minority data or minority class data) of anomalies, failure instances, or other rare events (e.g., minority classes). The additional data points of the minority class can help train the storage device failure prediction model to more accurately predict whether an SSD failure is likely.

[0059] In some embodiments, additional samples of minority data can be generated using different methods for defining anomalous data. According to an embodiment, minority data can be generated using a rule-based method. According to the rule-based method, a user can define anomalous data as any data exceeding a predetermined threshold. One or more rules can be defined corresponding to various failure conditions. Based on the criteria that cause data to be classified as anomalous, additional samples of anomalous data can be generated. For example, a user can use an API to define device-associated temperature measurements exceeding approximately 100 degrees Fahrenheit as anomalous data. Based on this definition, the disclosed system can generate additional samples of temperature measurements exceeding approximately 100 degrees Fahrenheit.

[0060] In other embodiments, minority class data can be generated using an instance-based approach. According to the instance-based approach, a user can specify sample data points from which additional data points with similar values can be generated. For example, a user can use an API to define temperature measurements from storage devices approximately 100 degrees Fahrenheit away as anomalous data. Based on this definition, additional samples of temperature measurements within 100 degrees Fahrenheit (such as 101 degrees Fahrenheit and 99 degrees Fahrenheit) can be generated.

[0061] In some embodiments, computationally intensive processing for generating samples can be offloaded to a nearby storage device (e.g., a storage device within a predetermined physical distance or a storage device that shares virtual resources or categories with a given storage device (or the like)) (such as a nearby SSD and a nearby SSD with an embedded processor).

[0062] To further improve the efficiency of data sampling, the data generated for training the model can be categorized according to anomaly definitions set by the user (e.g., definitions of anomalies or anomalous data). When data is generated, those data points that satisfy the user-provided anomaly definitions can be categorized and stored in a table. This allows for more efficient retrieval of those data points without scanning the entire database. In some embodiments, the categorized data can be stored on a nearby storage device (e.g., a storage device within a predetermined physical distance or a storage device sharing virtual resources or categorization with a given storage device (or the like)) (such as a nearby SSD or a nearby SSD with an embedded processor).

[0063] Figure 4 An overview of example techniques for classifying anomalous data, based on example embodiments, is provided.

[0064] Reference Figure 4The dataset within database 430 (e.g., a telemetry database) can be used to train a storage device (e.g., an SSD) failure prediction model. Client 410 can use API 420 to set rules for defining anomalous data in database 430. For example, client 410 can set a rule to display data with temperatures greater than 100 degrees Fahrenheit. According to this embodiment, data satisfying this rule can be identified as anomalous data. The dataset in table 440 can include the dataset from database 430, which can include data that satisfies or does not satisfy the rules set by the user. For example, data point SN1 does not satisfy the rules set by the user, while data point SN2 does. Considering that a relatively large amount of data may be included in database 430, identifying data that satisfies the rules set by the user can be computationally intensive. In some embodiments, data in database 430 that satisfies the rules set by the user can be stored in a separate table 450. For example, both data point SN1 and data point SN2 satisfy the rules set by the user and can be identified as anomalous data and stored in table 450. In some embodiments, generated anomalous data can also be stored in table 450. By storing abnormal data in table 450, abnormal data can be retrieved efficiently without inspecting the entire database 430.

[0065] Figure 5 A flowchart illustrating example operations for training a storage device failure prediction model using a modified dataset, according to a disclosed example embodiment, is shown.

[0066] Reference Figure 5 The dataset within database 510 can be used to train a storage device failure prediction model. In step 520, the dataset within database 510 can be segmented into multiple time period intervals (e.g., multiple time period windows). For example, the dataset within database 510 can be segmented into one or more subsets corresponding to one or more time period windows based on the time period windows. In some embodiments, a window can be a six-month time period interval. In step 530, weights can be assigned to each window. In some embodiments, larger weights can be assigned to more recent data (in other words, larger weights can be assigned to windows that include more recent data). As previously mentioned, more recent data can be considered to have a larger assigned weight, at least because more recent data can more accurately represent the state and / or workload characteristics of the storage device. In step 540, data can be selected based on the assigned weights. For example, a window can be assigned a weight of 1, and all data from that window can be selected. In another example, a second window can be assigned a weight. Furthermore, half of the data from the second window can be selected. In step 570, the selected data can be used to train a storage device failure prediction model.

[0067] In some embodiments, in step 550, anomalous data can be selected from a database. Anomalous data may include, but is not limited to, data points associated with instances of SSD failure and / or other data points with a relatively low probability of occurrence (e.g., outliers). Anomalies can be identified using anomaly detection algorithms (such as autoencoder techniques, isolated forest techniques, etc.), but are not limited to these. Furthermore, anomalies can be identified using rule-based methods or clustering methods. In step 570, a storage device failure prediction model can be trained using the selected anomalous data.

[0068] In some embodiments, at step 560, additional anomalous data may be generated. In some embodiments, the anomalous data may be generated using a rule-based approach. According to the rule-based approach, a user can define anomalous data as any data exceeding a predetermined threshold. One or more rules may be defined corresponding to various failure conditions. Based on the criteria that cause data to be classified as anomalous data, additional samples of anomalous data may be generated. For example, a user may use an API to define device-associated temperature measurements exceeding approximately 100 degrees Fahrenheit as anomalous data. Based on this definition, the disclosed system may generate additional samples of temperature measurements exceeding approximately 100 degrees Fahrenheit. In other embodiments, the anomalous data may be generated using an instance-based approach. According to the instance-based approach, a user may specify sample data points from which additional data points with similar values can be generated. For example, a user may use an API to define device-associated temperature measurements within approximately 100 degrees Fahrenheit as anomalous data. Based on this definition, additional samples of temperature measurements within 100 degrees Fahrenheit (such as 101 degrees Fahrenheit and 99 degrees Fahrenheit) may be generated. At step 570, the generated data may be used to train a storage device failure prediction model.

[0069] According to various embodiments of the invention described herein, machine learning models can be deployed to predict whether a storage device is likely to fail in the future based on its properties. In some embodiments, the machine learning model can be deployed within a processor (such as a general-purpose central processing unit (CPU)). In other embodiments, the machine learning model can be offloaded to a nearby storage device (such as a nearby SSD (e.g., an SSD within a predetermined physical distance) or a nearby SSD with an embedded processor).

[0070] Electronic or electrical devices and / or any other related devices or components according to embodiments of the invention described herein can be implemented using any suitable hardware, firmware (e.g., application-specific integrated circuits), software, or a combination of software, firmware, and hardware. For example, various components of these devices can be formed on an integrated circuit (IC) chip or on a separate IC chip. Furthermore, various components of these devices can be implemented on a flexible printed circuit film, tape-on-a-carrier package (TCP), or printed circuit board (PCB), or formed on a substrate. Additionally, various components of these devices can be processes or threads that run on one or more processors in one or more computing devices, execute computer program instructions, and interact with other system components to perform the various functions described herein. The computer program instructions are stored in memory, which can be implemented in the computing device using standard memory devices (such as random access memory (RAM) for example). The computer program instructions can also be stored in other non-transitory computer-readable media (such as CD-ROMs, flash drives, etc. for example).

[0071] The features of the inventive concept and the methods for implementing the inventive concept can be more readily understood by referring to the foregoing detailed description and accompanying drawings of the embodiments. The foregoing embodiments have been described in more detail with reference to the accompanying drawings, in which the same reference numerals consistently denote the same elements. However, this disclosure can be implemented in various different forms and should not be construed as being limited to the embodiments shown herein. Rather, these embodiments are provided as examples so that this disclosure will be thorough and complete, and will fully convey the aspects and features of this disclosure to those skilled in the art. Therefore, processes, elements, and techniques that are not essential for those skilled in the art to fully understand the aspects and features of the embodiments of this disclosure may not be described. Unless otherwise stated, the same reference numerals denote the same elements throughout the drawings and written description, and therefore their descriptions will not be repeated. In the drawings, the relative dimensions of elements, layers, and regions may be exaggerated for clarity.

[0072] In the preceding description, numerous specific details have been set forth for illustrative purposes to provide a thorough understanding of the various embodiments. However, it will be clear that the various embodiments may be practiced without these specific details or in one or more equivalent arrangements. Furthermore, those skilled in the art will understand that various features of the two or more embodiments described herein can be combined in any suitable manner without departing from the spirit or scope of this disclosure. In other instances, well-known structures and apparatuses are shown in block diagram form to avoid unnecessarily obscuring the various embodiments.

[0073] It is understood that when an element, layer, region, or component is referred to as being "on," "connected to," or "bonded to" another element, layer, region, or component, that element, layer, region, or component may be directly on, directly connected to, or directly bonded to the other element, layer, region, or component, or there may be one or more intermediate elements, intermediate layers, intermediate regions, or intermediate components. However, "directly connected / directly bonded" means that one component is directly connected to or directly bonded to another component without any intermediate components. Meanwhile, other expressions describing relationships between components (such as "between" and "immediately between," or "adjacent to" and "adjacent to") can be interpreted similarly. Furthermore, it is understood that when an element or layer is referred to as being "between" two elements or layers, that element or layer may be the only element or the only layer between the two elements or layers, or there may be one or more intermediate elements or intermediate layers.

[0074] The terminology used herein is for the purpose of describing particular embodiments only and is not intended to limit this disclosure. As used herein, the singular form is intended to include the plural form as well, unless the context clearly indicates otherwise. It will also be understood that the terms “comprising,” “having,” and “including,” as used in this specification, indicate the presence of the stated features, integrals, steps, operations, elements, and / or components, but do not exclude the presence or addition of one or more other features, integrals, steps, operations, elements, components, and / or groups thereof. As used herein, the term “and / or” includes any and all combinations of one or more of the associated listed items.

[0075] As used herein, the terms “approximately,” “about,” “approximately (roughly),” and similar terms are used as approximate terms rather than terms of degree and are intended to account for inherent deviations in measured or calculated values that would be recognized by those skilled in the art. As used herein, “about” or “approximately (roughly)” includes the stated value and indicates an acceptable range of deviation from the stated value as determined by those skilled in the art, taking into account the measurement in question and errors associated with the measurement of the particular quantity (e.g., limitations of the measurement system). For example, “about” may mean within one or more standard deviations, or within ±30%, 20%, 10%, or 5% of the stated value. Furthermore, the word “may” is used when describing disclosed embodiments to mean “one or more disclosed embodiments.” As used herein, the terms “use,” “being used,” and “being exploited” may be considered synonymous with the terms “utilize,” “being exploited,” and “being exploited,” respectively. Additionally, the term “exemplary” is intended to indicate an example or illustration.

[0076] When a particular embodiment can be implemented differently, a particular order of processing can be executed differently from the order in which it is described. For example, two consecutively described processes can be executed substantially simultaneously or in the reverse order of their description.

[0077] The foregoing is illustrative of exemplary embodiments and is not to be construed as limiting the exemplary embodiments. Although some exemplary embodiments have been described, it will be readily understood by those skilled in the art that many modifications are possible in the exemplary embodiments without substantially departing from the inventive teachings and advantages of the exemplary embodiments. Therefore, all such modifications are intended to be included within the scope of the exemplary embodiments as defined in the claims. In the claims, the means-plus-function clause is intended to cover structures described herein as performing the described functions, and not only structural equivalents but also equivalent structures. Therefore, it will be understood that the foregoing is illustrative of exemplary embodiments and is not to be construed as limiting to the specific embodiments disclosed, and modifications to the disclosed exemplary embodiments and other exemplary embodiments are intended to be included within the scope of the appended claims. The inventive concept is defined by the claims together with the equivalents of the claims to be included therein.

Claims

1. A method for training a machine learning model, the method comprising: The processor divides the dataset from the database into one or more subsets based on a time period window, wherein the dataset from the database is data collected from the storage device; The processor assigns one or more weighted values to the one or more subsets of data according to time period windows of the one or more subsets of data, wherein the one or more weighted values are reduced by half for each time period window from the time period window including the most recent data to the time period window including the least recent data. The processor generates a training dataset from the one or more subsets of the dataset based on the one or more weighting values; and The processor uses the training dataset to train the machine learning model. Among them, the machine learning models include storage device failure prediction models. The method further includes: The processor identifies anomalous data in a dataset from the database, including storage device failure data. The processor retrieves anomalous data from the dataset obtained from the database; and The processor adds abnormal data to the training dataset.

2. The method according to claim 1, wherein, Use rule-based methods to identify anomalous data.

3. The method according to claim 1, wherein, Use clustering-based methods to identify outlier data.

4. The method according to any one of claims 1 to 3, wherein the method further comprises: The processor generates the abnormal data; as well as The processor adds the generated anomalous data to the training dataset.

5. A data system, the data system comprising: database; The processor connects to the database; as well as A memory, connected to a processor, wherein the memory stores instructions that, when executed by the processor, cause the processor to: The dataset from the database is divided into one or more subsets based on a time period window, where the dataset from the database is data collected from the storage device; One or more weighting values are assigned to the one or more subsets of data according to the time period windows of the one or more subsets of data, wherein the one or more weighting values are reduced by half for each time period window from the time period window including the most recent data to the time period window including the least recent data. A training dataset is generated from the one or more subsets based on the one or more weighting values; and Train a machine learning model using a training dataset. Among them, the machine learning models include storage device failure prediction models. The processor is also configured as follows: Identify anomalous data in datasets from a database, including storage device failure data; Retrieve anomalous data from a dataset sourced from a database; and Add abnormal data to the training dataset.

6. The data system according to claim 5, wherein, Use rule-based methods to identify anomalous data.

7. The data system according to claim 5, wherein, Use clustering-based methods to identify outlier data.

8. The data system according to any one of claims 5 to 7, wherein, The processor is also configured as follows: Generate abnormal data; and Add the generated abnormal data to the training dataset.