Model training method, fault determination method, electronic device, and program product
By training a fault determination model and utilizing background media scan logs and machine learning methods, the lack of precision in traditional disk fault determination techniques has been addressed, enabling accurate prediction of disk faulty sectors and improving system reliability.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- EMC IP HLDG CO LLC
- Filing Date
- 2021-07-23
- Publication Date
- 2026-06-26
AI Technical Summary
Traditional disk fault diagnosis techniques lack granularity and cannot support fine-grained disk processing, thus failing to meet the needs of users and administrators.
By acquiring a fault dataset associated with disk faulty sectors, a fault determination model is trained to predict the set of faulty sectors and their types that will appear on the disk. By utilizing background media scan logs and machine learning methods such as random forest models, the accuracy of disk fault prediction is improved.
It enables fine-grained monitoring and prediction of disk failure sectors, improving system reliability, reducing storage costs, and increasing access speed.
Smart Images

Figure CN115700549B_ABST
Abstract
Description
Technical Field
[0001] The embodiments disclosed herein generally relate to computer technology, and more specifically to model training methods, fault determination methods, electronic devices, and computer program products, which can be used in the fields of disk management and data protection. Background Technology
[0002] Many techniques have been proposed to prevent data loss due to disk failure. However, these techniques typically focus on overall disk failures, rarely addressing fine-grained disk health. Furthermore, because traditional disk failure detection techniques focus on the entire disk, they tend to treat the entire disk uniformly when a failure is detected. In reality, when a disk fails, the fault often only exists in a subset of sectors. Therefore, traditional disk failure detection techniques lack granularity, failing to support fine-grained disk processing and failing to meet the needs of disk users and administrators. Summary of the Invention
[0003] Embodiments of this disclosure provide model training methods, fault determination methods, electronic devices, and computer program products.
[0004] In a first aspect of this disclosure, a model training method is provided. The method includes: acquiring multiple disk failure datasets associated with at least one faulty sector of a disk and collected within a first time period; acquiring another disk failure dataset associated with the at least one faulty sector and collected at a predetermined time point after the first time period, the other disk failure dataset indicating fault information regarding at least one set of faulty sectors to which the at least one faulty sector belongs; and training a fault determination model based on the multiple disk failure datasets and the fault information, such that the probability of the predicted fault information at the predetermined time point, determined by the trained fault determination model based on the multiple disk failure datasets, matching the fault information is greater than a first threshold probability.
[0005] In a second aspect of this disclosure, a fault determination method is provided. The method includes: acquiring multiple disk fault datasets collected within a first time period and associated with at least one faulty sector of a disk; and determining, based on a trained fault determination model obtained according to the first aspect of this disclosure and the multiple disk fault datasets, fault information regarding at least one set of faulty sectors to which the at least one faulty sector belongs, at a predetermined time point after the first time period.
[0006] In a third aspect of this disclosure, an electronic device is provided. The electronic device includes: at least one processing unit; and at least one memory coupled to the at least one processing unit and storing instructions for execution by the at least one processing unit. When executed by the at least one processing unit, the instructions cause the device to perform an action, the action including: acquiring a plurality of disk fault datasets associated with at least one faulty sector of a disk and collected within a first time period; acquiring another disk fault dataset associated with the at least one faulty sector and collected at a predetermined time point after the first time period, the other disk fault dataset indicating fault information regarding at least one set of faulty sectors to which the at least one faulty sector belongs; and training a fault determination model based on the plurality of disk fault datasets and the fault information, such that the probability of the predicted fault information at the predetermined time point, determined by the trained fault determination model based on the plurality of disk fault datasets, matching the fault information is greater than a first threshold probability.
[0007] In a fourth aspect of this disclosure, an electronic device is provided. The electronic device includes: at least one processing unit; and at least one memory coupled to the at least one processing unit and storing instructions for execution by the at least one processing unit. When executed by the at least one processing unit, the instructions cause the device to perform actions including: acquiring a plurality of disk fault datasets collected within a first time period and associated with at least one faulty sector of a disk; and determining, based on a trained fault determination model obtained according to a third aspect of this disclosure and the plurality of disk fault datasets, fault information regarding at least one set of faulty sectors to which the at least one faulty sector belongs at a predetermined time point after the first time period.
[0008] In a fifth aspect of this disclosure, a computer program product is provided. This computer program product is tangibly stored on a non-transitory computer-readable medium and includes machine-executable instructions that, when executed, cause a machine to perform any step of the method described in the first aspect of this disclosure.
[0009] In a sixth aspect of this disclosure, a computer program product is provided. This computer program product is tangibly stored on a non-transitory computer-readable medium and includes machine-executable instructions that, when executed, cause a machine to perform any step of the method described in the second aspect of this disclosure.
[0010] The summary section is provided to present the chosen concepts in a simplified form, which will be further described in the detailed description below. The summary section is not intended to identify key or essential features of the embodiments of this disclosure, nor is it intended to limit the scope of the embodiments of this disclosure. Attached Figure Description
[0011] The above and other objects, features and advantages of this disclosure will become more apparent from the accompanying drawings, in which like reference numerals generally denote like parts.
[0012] Figure 1 A schematic diagram of a model training environment 100 in which devices and / or methods according to embodiments of the present disclosure may be implemented is shown;
[0013] Figure 2 A flowchart of a model training method 200 according to an embodiment of the present disclosure is shown;
[0014] Figure 3 A flowchart of a model training method 300 according to an embodiment of the present disclosure is shown;
[0015] Figure 4 A flowchart of a fault determination method 400 according to an embodiment of the present disclosure is shown; and
[0016] Figure 5 A schematic block diagram of an example device 500 that can be used to implement embodiments of the present disclosure is shown.
[0017] In the various figures, the same or corresponding reference numerals indicate the same or corresponding parts. Detailed Implementation
[0018] Preferred embodiments of the present disclosure will now be described in more detail with reference to the accompanying drawings. While preferred embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be implemented in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that the present disclosure will be thorough and complete, and will fully convey the scope of the present disclosure to those skilled in the art.
[0019] The term "comprising" and its variations as used herein indicate an open-ended inclusion, for example, "including but not limited to". Unless otherwise stated, the term "or" means "and / or". The term "based on" means "at least partially based on". The terms "an example embodiment" and "an embodiment" mean "at least one embodiment". The term "another embodiment" means "at least one additional embodiment". The terms "first", "second", etc., may refer to different or the same objects. Other explicit and implicit definitions may also be included below.
[0020] As described above in the background section, traditional disk fault diagnosis techniques are insufficient in terms of granularity, thus failing to support fine-grained disk processing and failing to meet the needs of disk users and administrators.
[0021] To at least partially address one or more of the aforementioned problems and other potential issues, embodiments of this disclosure propose a method for fine-grained monitoring of disk health and subsequent disk fault determination. Generally, in embodiments of this disclosure, a disk comprising a large number of sectors can be divided into, for example, sets of sectors of the same size, which may be referred to as storage blocks and can be used as virtual disks. Furthermore, in embodiments of this disclosure, focus can be placed on the health of sectors or sets of sectors within the disk, rather than the overall health of the disk.
[0022] Sector failures in a disk can include silent failures and input / output access failures.
[0023] Silent sector failures on a disk refer to sectors failing during periods of inactivity without any access, which is common in disk backup / archiving scenarios. However, silent failures are often difficult to identify without accessing the sector. As the number of such failed sectors increases, it severely impacts the reliability of the entire system's storage. For example, if a disk read operation reveals too many unrecoverable failures on a redundant RAID array, it can lead to a data link failure. Current solutions involve periodically cleaning up the entire disk, but this can incur unexpected additional central processing unit or I / O costs.
[0024] Disk input / output access failure refers to slow input / output access on the disk caused by faulty sectors, which is caused by the disk's internal command retries and recovery.
[0025] To address the above-mentioned problems and other potential issues, this disclosure focuses on how to determine the set of faulty sectors in a disk, and further on how to determine the fault type of the set of faulty sectors. In embodiments of this disclosure, a disk fault dataset, including, for example, background media scan logs, is used to determine faulty sectors in the disk, and a fault determination model is built and trained to predict the set of sectors in the disk that will fail and the fault type of that set. In this way, the set of sectors in the disk that will fail can be predicted, allowing for proactive measures to be taken, thereby improving system reliability, reducing storage costs, and increasing access speed.
[0026] Figure 1 A schematic block diagram of a model training environment 100 in which model training methods of some embodiments of the present disclosure may be implemented is shown. According to embodiments of the present disclosure, the model training environment 100 may be a cloud environment.
[0027] like Figure 1As shown, the model training environment 100 includes a computing device 110. In the model training environment 100, training-related data 120, such as a subset of the aforementioned data, is provided as input to the computing device 110 as a subset of the data. This includes data associated with at least one faulty sector of a disk and collected within a first time period; another set of faulty sectors, indicating fault information about the at least one faulty sector to which it belongs, and collected at a predetermined time point after the first time period. According to embodiments of this disclosure, the training-related data 120 may further include other relevant data and parameters required for training the fault determination model 130.
[0028] The computing device 110 can interact with the fault determination model 130. For example, the computing device 110 can provide at least a portion of the training related data 120 to the fault determination model 130, receive predicted fault information determined by the fault determination model 130 based on the training related data 120 from the fault determination model 130, and issue a stop training instruction to the fault determination model 130 by determining whether the predicted fault information determined by the fault determination model 130 matches the fault information indicated by another disk fault dataset.
[0029] It should be understood that the model training environment 100 is merely exemplary and not restrictive, and it is scalable or scalable. For example, the model training environment 100 may include more computing devices 110, and more training-related data 120 may be provided to the computing devices 110 as input. The computing devices 110 may also interact with more fault-determination models 130, thereby meeting the needs of more users to simultaneously or asynchronously train the fault-determination models 130 using more computing devices 110 and even more training-related data 120.
[0030] exist Figure 1 In the model training environment 100 shown, the input of training-related data 120 to the computing device 110 and the interaction between the computing device 110 and the fault determination model 130 can be carried out through the network.
[0031] The following is Figure 1 The following is an example of the computing device 110, training data 120, and fault determination model 130 included in the example. Figure 2 , Figure 3 and Figure 4 The model training method 200, model training method 300, and fault determination method 400 are shown.
[0032] Figure 2A flowchart of a model training method 200 according to an embodiment of the present disclosure is shown. Method 200 can be performed by... Figure 1 The computing device 110 shown is used for implementation, but it can also be implemented by other suitable devices. It should be understood that the model training method 200 may also include additional steps not shown and / or the steps shown may be omitted, and the scope of the embodiments of this disclosure is not limited in this respect.
[0033] In block 202, computing device 110 acquires multiple disk failure datasets associated with at least one faulty sector of a disk, collected within a first time period. According to embodiments of this disclosure, acquiring the multiple disk failure datasets may include acquiring at least one of the following parameters for each faulty sector among the at least one faulty sector, collected at a first time point within the first time period: background media scan logs; log counts indicating the number of background media scan logs associated with the faulty sector in the background media scan logs; a fault count of the sector set to which the faulty sector belongs; and a fault count of the sector set adjacent to the sector set to which the faulty sector belongs. According to embodiments of this disclosure, the first time point within the first time period may refer to the time point at which any one of the multiple disk failure datasets is collected within the first time period, and the sector set adjacent to the sector set to which the faulty sector belongs may be a predetermined number of adjacent sector sets on the disk, ordered by address before and / or after the sector set to which the faulty sector belongs.
[0034] Background Media Scan (BMS) is a background scanning mechanism within the disk firmware. BMS identifies faulty sectors on the disk. Faulty sectors include sectors that are difficult to read or recover, sectors that cannot be read or recovered, and sectors with associated log problems. BMS generates a background media scan log for each sector that fails. In other words, if a sector on a disk fails three times, a background media scan will generate three background media scan logs for that sector. The aforementioned log count reflects the number of background media scan logs generated for the same sector.
[0035] The background media scan log may include at least one of the following: Power-on time (POM), indicating the total power-on time of the disk when the faulty sector occurred; the identifier of the faulty sector; and the fault type of the faulty sector. The identifier of the faulty sector may, for example, include the logical block address used to indicate the faulty sector. The fault type of the faulty sector may, for example, include sector media failure or sector recovery failure, which may be indicated by the SENSE KEY field in the background media scan log. A sector media failure occurs when a sector on the disk cannot be read or written due to a media defect. A sector recovery failure occurs when a command completes successfully, but requires retry or error correction within the disk firmware to retrieve the data.
[0036] In block 204, computing device 110 acquires another disk failure dataset associated with at least one faulty sector, collected at a predetermined time point after the first time period. According to embodiments of this disclosure, the other disk failure dataset indicates failure information regarding at least one set of faulty sectors to which the at least one faulty sector belongs.
[0037] According to embodiments of this disclosure, the method of collecting multiple disk failure datasets within a first time period and collecting another disk failure dataset at a predetermined time point after the first time period can be the same. Furthermore, the interval between collecting two adjacent disk failure datasets among the multiple disk failure datasets can be the same as the interval between the end of the first time period and the aforementioned predetermined time point. In other words, collecting multiple disk failure datasets can refer to the first N collections, and collecting another disk failure dataset can refer to the (N+1)th collection.
[0038] According to embodiments of this disclosure, faulty sectors on a disk can have high spatial locality, and faulty sectors can be highly correlated with each other. This means that faulty sectors or sets of faulty sectors adjacent to a faulty sector or set of faulty sectors are more likely to be detected as faulty in the next background media scan. In one example, all faulty sectors can be concentrated in a small area, which accounts for only 0.0014% of the total disk capacity.
[0039] In box 206, computing device 110 trains a fault determination model based on multiple disk fault datasets and fault information, such that the probability of the predicted fault information at a predetermined time point, determined by the trained fault determination model based on multiple disk fault datasets, matching the fault information is greater than a first threshold probability.
[0040] According to embodiments of this disclosure, the fault determination model can be a machine learning model built based on the random forest method for predicting sets of faulty sectors.
[0041] According to embodiments of this disclosure, blocks 202 and 204 relate to acquiring samples for training a fault determination model, wherein block 202 relates to acquiring data required for determining fault information, and block 204 relates to acquiring a reference standard answer for verifying whether the predicted fault information determined by the trained fault determination model is correct. Therefore, in block 206, the computing device 110 can continuously adjust the parameters of the training fault determination model such that the probability of the predicted fault information at a predetermined time point matching the fault information determined by the trained fault determination model based on multiple disk fault datasets is greater than a first threshold probability, i.e., the predicted fault information converges to the fault information indicated by another disk fault dataset acquired in block 204.
[0042] Figure 3 A flowchart of a model training method 300 according to an embodiment of the present disclosure is shown. Method 300 can be performed by... Figure 1 The computing device 110 shown is used for implementation, but it can also be implemented by other suitable devices. It should be understood that the model training method 300 may also include additional steps not shown and / or the steps shown may be omitted, and the scope of the embodiments of this disclosure is not limited in this respect.
[0043] In box 302, computing device 110 acquires multiple disk failure datasets associated with at least one faulty sector of the disk, collected within a first time period. The content of box 302 is the same as that of box 202, and will not be repeated here.
[0044] In box 304, computing device 110 acquires another disk failure dataset associated with at least one faulty sector, collected at a predetermined time point after the first time period. The content of box 304 is the same as that of box 204, and will not be repeated here.
[0045] In block 306, computing device 110 acquires the sector set fault type associated with at least one set of faulty sectors. According to embodiments of this disclosure, block 302 also relates to acquiring data needed to determine fault information, and block 304 also relates to acquiring a reference standard answer to verify whether the predicted fault information determined by the trained fault determination model is correct; therefore, the sector set fault type associated with at least one set of faulty sectors can be a manually labeled sector set fault type.
[0046] In box 308, computing device 110 trains a fault determination model based on multiple disk fault datasets and fault information, such that the probability of the predicted fault information at a predetermined time point, determined by the trained fault determination model based on multiple disk fault datasets, matching the actual fault information is greater than a first threshold probability, and the probability of the predicted sector set fault type determined by the trained fault determination model based on multiple disk fault datasets matching the acquired sector set fault type is greater than a second threshold probability. The process of computing device 110 training a fault determination model based on multiple disk fault datasets and fault information, such that the probability of the predicted fault information at a predetermined time point, determined by the trained fault determination model based on multiple disk fault datasets, matching the actual fault information is greater than the first threshold probability, is the same as that involved in box 206, and will not be repeated here.
[0047] According to embodiments of this disclosure, blocks 302, 304, and 306 relate to acquiring samples for training a fault determination model, wherein block 306 relates to acquiring a reference standard answer for verifying whether the predicted sector set fault type determined by the trained fault determination model is correct. Therefore, in block 308, the computing device 110 can continuously adjust the parameters of the fault determination model during training, such that the probability that the predicted sector set fault type determined by the trained fault determination model based on multiple disk fault datasets matches the acquired sector set fault type is greater than a second threshold probability, i.e., the predicted sector set fault type converges to the sector set fault type acquired in block 306.
[0048] In box 310, computing device 110 determines, based on multiple disk failure datasets, whether the number of faulty sectors in at least one set of faulty sectors is greater than a first threshold number. When computing device 110 determines, based on multiple disk failure datasets, that the number of faulty sectors in at least one set of faulty sectors is greater than the first threshold number, method 300 proceeds to box 312; otherwise, method 300 proceeds to box 314.
[0049] According to embodiments of this disclosure, the first threshold number can be a preset number set based on the number of sectors included in the disk. The larger the number of sectors included in the disk, the larger the first threshold number can be.
[0050] In block 312, computing device 110 determines the predicted sector set failure type as a first sector set failure type. According to embodiments of this disclosure, the first sector set failure type indicates that the number of faulty sector sets in the disk is large.
[0051] In box 314, computing device 110 determines the sector set fault type for a sector set in at least one fault sector set where the number of fault sectors is greater than a second threshold number as the second sector set fault type.
[0052] According to embodiments of this disclosure, the second threshold number can be a preset number set based on the number of sectors in the sector set included in the disk. The larger the number of sectors included in the sector set, the larger the second threshold number can be.
[0053] According to embodiments of this disclosure, the second sector set fault type indicates that the number of faulty sectors in a certain sector set is large.
[0054] According to embodiments of this disclosure, when a set of faulty sectors is neither a first sector set fault type nor a second sector set fault type, the computing device 110 can determine this set of faulty sectors as a third sector type, the third sector set fault type indicating that the number of faulty sectors in a certain sector set is small.
[0055] It should be understood that method 300 includes more steps than method 200 and can be considered an extension of method 200.
[0056] Figure 4 A flowchart of a fault determination method 400 according to an embodiment of the present disclosure is shown. Method 400 can be... Figure 1 The fault determination method 400 is implemented by the computing device 110 shown, but may also be implemented by other suitable devices. It should be understood that the fault determination method 400 may also include additional steps not shown and / or the steps shown may be omitted, and the scope of the embodiments of this disclosure is not limited in this respect.
[0057] In box 402, computing device 110 acquires multiple disk failure datasets collected within a first time period, associated with at least one faulty sector of the disk. The content of box 402 is the same as that of boxes 202 and 302, and will not be repeated here.
[0058] In box 404, computing device 110 determines, based on a trained fault determination model obtained according to model training method 200 or model training method 300 and multiple disk fault datasets obtained in box 402, fault information about at least one fault sector set to which at least one fault sector belongs at a predetermined time point after a first time period.
[0059] According to embodiments of this disclosure, determining a predetermined time point after a first time period, and the fault information regarding at least one set of faulty sectors to which at least one faulty sector belongs, may include determining the aforementioned fault information and the fault type of the sector set associated with the at least one set of faulty sectors.
[0060] According to some embodiments of this disclosure, determining the sector set fault type may include determining the sector set fault type as a first sector set fault type if it is determined that the number of faulty sector sets in at least one faulty sector set is greater than a first threshold number.
[0061] According to some other embodiments of this disclosure, determining the sector set fault type may include determining the sector set fault type for the sector set in the at least one fault sector set whose number of fault sectors is less than or equal to a first threshold number as a second sector set fault type if it is determined that the number of fault sectors in at least one fault sector set is less than or equal to a first threshold number.
[0062] The above is for reference only. Figures 1 to 4 This document describes a model training environment 100, a model training method 200, a model training method 300, and a fault determination method 400, in which the devices and / or methods according to embodiments of this disclosure may be implemented. It should be understood that the above description is intended to better illustrate the content described in the embodiments of this disclosure and is not intended to limit the scope in any way.
[0063] It should be understood that the number and size of various elements used in the embodiments and accompanying drawings of this disclosure are merely examples and are not intended to limit the scope of protection of the embodiments of this disclosure. The aforementioned numbers and sizes can be arbitrarily set as needed without affecting the normal implementation of the embodiments of this disclosure.
[0064] Based on the above references Figures 1 to 4 The present disclosure, based on the technical solutions of its embodiments, proposes a model training method and a fault determination method. These methods can predict fault information for a set of sectors on a disk based on a disk fault dataset associated with a faulty sector, allowing disk users or administrators to anticipate the potential fault conditions of the disk's sector set. Furthermore, based on the model training method and fault determination method proposed in the embodiments of the present disclosure, the fault type of the faulty sector set can also be determined, enabling disk users or administrators to anticipate the type of fault the disk's sector set will possess.
[0065] The technical effectiveness of the fault determination method according to embodiments of this disclosure is described below using two examples and the true positive rate (TPR) and false positive rate (FPR). In the examples of this disclosure, the true positive rate refers to the percentage of a set of sectors that will fail in the future that are correctly identified, and the false positive rate refers to the percentage of a set of sectors that will not fail in the future that are incorrectly identified. In both examples, for a 4TB disk with firmware "GS1F", the true positive rate for fault determination based on the collected disk fault dataset reaches 98.21%, while the false positive rate is only 0.913%; for an 8TB disk with firmware "UM02", the true positive rate for fault determination based on the collected disk fault dataset reaches 98.17%, while the false positive rate is only 0.4%. Therefore, the fault determination method according to this disclosure has a very high reliability.
[0066] Figure 5 The figure illustrates a schematic block diagram of an example device 500 that can be used to implement embodiments of the present disclosure. According to embodiments of the present disclosure, Figure 1 The computing device 110 can be implemented by device 500. As shown, device 500 includes a central processing unit (CPU) 501, which can perform various appropriate actions and processes according to computer program instructions stored in read-only memory (ROM) 502 or loaded from storage unit 508 into random access memory (RAM) 503. RAM 503 can also store various programs and data required for the operation of device 500. CPU 501, ROM 502, and RAM 503 are interconnected via bus 504. Input / output (I / O) interface 505 is also connected to bus 504.
[0067] Multiple components in device 500 are connected to I / O interface 505, including: input unit 506, such as keyboard, mouse, etc.; output unit 507, such as various types of monitors, speakers, etc.; storage unit 508, such as disk, optical disk, etc.; and communication unit 509, such as network card, modem, wireless transceiver, etc. Communication unit 509 allows device 500 to exchange information / data with other devices through computer networks such as the Internet and / or various telecommunications networks.
[0068] The various processes and procedures of methods 200, 300, and 400 described above may be executed by processing unit 501. For example, in some embodiments, methods 200, 300, and 400 may be implemented as computer software programs tangibly contained in a machine-readable medium, such as storage unit 508. In some embodiments, part or all of the computer program may be loaded and / or installed on device 500 via ROM 502 and / or communication unit 509. When the computer program is loaded into RAM 503 and executed by CPU 501, one or more actions of methods 200, 300, and 400 described above may be performed.
[0069] Embodiments of this disclosure may relate to methods, apparatus, systems, and / or computer program products. A computer program product may include a computer-readable storage medium having computer-readable program instructions loaded thereon for performing various aspects of embodiments of this disclosure.
[0070] Computer-readable storage media can be tangible devices capable of holding and storing instructions for use by an instruction execution device. Computer-readable storage media can be, for example, but not limited to, electrical storage devices, magnetic storage devices, optical storage devices, electromagnetic storage devices, semiconductor storage devices, or any suitable combination thereof. More specific examples of computer-readable storage media, as a non-exhaustive list, include: portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static random access memory (SRAM), portable compact disc read-only memory (CD-ROM), digital multifunction disc (DVD), memory sticks, floppy disks, mechanical encoding devices, such as punch cards or recessed protrusions storing instructions thereon, and any suitable combination thereof. The computer-readable storage media used herein are not to be construed as transient signals themselves, such as radio waves or other freely propagating electromagnetic waves, such as light pulses propagating through waveguides or other transmission media via fiber optic cables, or electrical signals transmitted via wires.
[0071] The computer-readable program instructions described herein can be downloaded from computer-readable storage media to various computing / processing devices, or downloaded via a network, such as the Internet, local area network, wide area network, and / or wireless network, to an external computer or external storage device. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and / or edge servers. A network adapter card or network interface in each computing / processing device receives the computer-readable program instructions from the network and forwards them to the computer-readable storage media in the respective computing / processing device.
[0072] Computer program instructions used to perform the operations of embodiments of this disclosure may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-dependent instructions, microcode, firmware instructions, state setting data, or source code or object code written in any combination of one or more programming languages, including object-oriented programming languages such as Smalltalk, C++, etc., and conventional procedural programming languages such as the "C" language or similar programming languages. The computer-readable program instructions may execute entirely on a user's computer, partially on a user's computer, as a standalone software package, partially on a user's computer and partially on a remote computer, or entirely on a remote computer or server. In cases involving remote computers, the remote computer may be connected to the user's computer via any type of network—including a local area network (LAN) or a wide area network (WAN)—or may be connected to an external computer via an Internet connection, for example, using an Internet service provider. In some embodiments, electronic circuitry, such as programmable logic circuitry, field-programmable gate arrays (FPGAs), or programmable logic arrays (PLAs), is personalized by utilizing state information from the computer-readable program instructions. This electronic circuitry can execute the computer-readable program instructions to implement various aspects of embodiments of this disclosure.
[0073] Various aspects of embodiments of this disclosure are described herein with reference to flowchart illustrations and / or block diagrams of methods, apparatus / systems, and computer program products according to embodiments of this disclosure. It should be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer-readable program instructions.
[0074] These computer-readable program instructions can be provided to a processing unit of a general-purpose computer, a special-purpose computer, or other programmable data processing apparatus to produce a machine such that, when executed by the processing unit of the computer or other programmable data processing apparatus, they create means for implementing the functions / actions specified in one or more blocks of the flowchart and / or block diagram. These computer-readable program instructions can also be stored in a computer-readable storage medium that causes a computer, programmable data processing apparatus, and / or other device to operate in a particular manner. Thus, the computer-readable medium storing the instructions comprises an article of manufacture that includes instructions for implementing aspects of the functions / actions specified in one or more blocks of the flowchart and / or block diagram.
[0075] Computer-readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable data processing apparatus, or other device to produce a computer-implemented process, thereby causing the instructions executed on the computer, other programmable data processing apparatus, or other device to perform the functions / actions specified in one or more boxes of a flowchart and / or block diagram.
[0076] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present disclosure. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of an instruction containing one or more executable instructions for implementing a specified logical function. In some alternative implementations, the functions marked in the blocks may occur in a different order than those shown in the drawings. For example, two consecutive blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, may be implemented using a dedicated hardware-based system that performs the specified function or action, or using a combination of dedicated hardware and computer instructions.
[0077] The various embodiments of this disclosure have been described above. These descriptions are exemplary and not exhaustive, and are not limited to the disclosed embodiments. Many modifications and variations will be apparent to those skilled in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen to best explain the principles, practical applications, or technical improvements to the technology in the market, or to enable others skilled in the art to understand the embodiments disclosed herein.
Claims
1. A model training method, comprising: Acquire multiple disk failure datasets collected within a first time period, associated with at least one faulty sector of the disk. Obtain another disk failure dataset associated with the at least one faulty sector, collected at a predetermined time point after the first time period, the other disk failure dataset indicating failure information about at least one set of faulty sectors to which the at least one faulty sector belongs; as well as A fault determination model is trained based on the multiple disk fault datasets and the fault information, such that the probability that the predicted fault information determined by the trained fault determination model based on the multiple disk fault datasets at the predetermined time point matches the fault information is greater than a first threshold probability.
2. The method according to claim 1, wherein obtaining the plurality of disk failure datasets comprises: Obtain at least one of the following parameters for each of the at least one faulty sector, collected at a first time point within the first time period: Background media scan log, Log count, indicating the number of background media scan logs associated with the faulty sector in the background media scan log. The fault count of the sector set to which the faulty sector belongs, and The fault count of the sector set adjacent to the sector set to which the faulty sector belongs.
3. The method according to claim 2, wherein the background media scan log includes at least one of the following: Power-on time indicates the total power-on time of the disk when the faulty sector fails; The identifier of the faulty sector; and The fault type of the faulty sector.
4. The method of claim 3, wherein the sector fault types include: Sector media failure; or Sector recovery failure.
5. The method according to claim 1, further comprising: Obtain the sector set fault type associated with the at least one faulty sector set; and Training the fault determination model includes: The fault determination model is trained such that the probability that the predicted sector set fault type determined by the trained fault determination model based on the multiple disk fault datasets matches the acquired sector set fault type is greater than a second threshold probability.
6. The method according to claim 5, further comprising: If, based on the multiple disk failure datasets, it is determined that the number of faulty sector sets in the at least one faulty sector set is greater than a first threshold number, then the predicted sector set failure type is determined as the first sector set failure type.
7. The method according to claim 5, further comprising: If, based on the plurality of disk failure datasets, it is determined that the number of faulty sector sets in the at least one faulty sector set is less than or equal to a first threshold number, then the sector set failure type for the faulty sector set in the at least one faulty sector set whose number of faulty sectors is greater than a second threshold number is determined as the second sector set failure type.
8. A fault determination method, comprising: Acquire multiple disk failure datasets collected within a first time period, associated with at least one faulty sector of the disk. as well as Based on the trained fault determination model obtained according to any one of claims 1 to 7 and the plurality of disk fault datasets, fault information regarding at least one set of fault sectors to which the at least one faulty sector belongs is determined at a predetermined time point after the first time period.
9. The method of claim 8, wherein determining the fault information comprises: Determine the fault information and the fault type of the sector set associated with the at least one fault sector set.
10. The method of claim 9, wherein determining the fault type of the sector set comprises: If it is determined that the number of faulty sector sets in the at least one faulty sector set is greater than a first threshold number, then the sector set fault type is determined as the first sector set fault type.
11. The method of claim 9, wherein determining the fault type of the sector set comprises: If it is determined that the number of faulty sector sets in the at least one faulty sector set is less than or equal to a first threshold number, then the sector set fault type for the faulty sector set in the at least one faulty sector set whose number of faulty sectors is greater than a second threshold number is determined as the second sector set fault type.
12. An electronic device, comprising: At least one processing unit; as well as At least one memory coupled to the at least one processing unit and storing instructions for execution by the at least one processing unit, the instructions, when executed by the at least one processing unit, causing the device to perform an action, the action including: Acquire multiple disk failure datasets collected within a first time period, associated with at least one faulty sector of the disk. Acquire another disk failure dataset associated with the at least one faulty sector, collected at a predetermined time point after the first time period, the other disk failure dataset indicating failure information regarding at least one set of faulty sectors to which the at least one faulty sector belongs; and A fault determination model is trained based on the multiple disk fault datasets and the fault information, such that the probability that the predicted fault information determined by the trained fault determination model based on the multiple disk fault datasets at the predetermined time point matches the fault information is greater than a first threshold probability.
13. The electronic device of claim 12, wherein obtaining the plurality of disk failure datasets comprises: Obtain at least one of the following parameters for each of the at least one faulty sector, collected at a first time point within the first time period: Background media scan log, Log count, indicating the number of background media scan logs associated with the faulty sector in the background media scan log. The fault count of the sector set to which the faulty sector belongs, and The fault count of the sector set adjacent to the sector set to which the faulty sector belongs.
14. The electronic device of claim 13, wherein the background media scan log comprises at least one of the following: Power-on time indicates the total power-on time of the disk when the faulty sector fails; The identifier of the faulty sector; and The fault type of the faulty sector.
15. The electronic device of claim 14, wherein the sector fault type includes: Sector media failure; or Sector recovery failure.
16. The electronic device of claim 12, wherein the action further comprises: Obtain the sector set fault type associated with the at least one faulty sector set; and Training the fault determination model includes: The fault determination model is trained such that the probability that the predicted sector set fault type determined by the trained fault determination model based on the multiple disk fault datasets matches the acquired sector set fault type is greater than a second threshold probability.
17. The electronic device of claim 16, wherein the action further comprises: If, based on the multiple disk failure datasets, it is determined that the number of faulty sector sets in the at least one faulty sector set is greater than a first threshold number, then the predicted sector set failure type is determined as the first sector set failure type.
18. The electronic device of claim 16, wherein the action further comprises: If, based on the plurality of disk failure datasets, it is determined that the number of faulty sector sets in the at least one faulty sector set is less than or equal to a first threshold number, then the sector set failure type for the faulty sector set in the at least one faulty sector set whose number of faulty sectors is greater than a second threshold number is determined as the second sector set failure type.
19. An electronic device comprising: At least one processing unit; as well as At least one memory coupled to the at least one processing unit and storing instructions for execution by the at least one processing unit, the instructions, when executed by the at least one processing unit, causing the device to perform an action, the action including: Acquire multiple disk failure datasets collected within a first time period, associated with at least one faulty sector of the disk. as well as Based on the trained fault determination model obtained according to any one of claims 12 to 18 and the plurality of disk fault datasets, fault information regarding at least one set of fault sectors to which the at least one faulty sector belongs is determined at a predetermined time point after the first time period.
20. The electronic device of claim 19, wherein determining the fault information includes: Determine the fault information and the fault type of the sector set associated with the at least one fault sector set.
21. The electronic device of claim 20, wherein determining the fault type of the sector set comprises: If it is determined that the number of faulty sector sets in the at least one faulty sector set is greater than a first threshold number, then the sector set fault type is determined as the first sector set fault type.
22. The electronic device of claim 20, wherein determining the fault type of the sector set comprises: If it is determined that the number of faulty sector sets in the at least one faulty sector set is less than or equal to a first threshold number, then the sector set fault type for the faulty sector set in the at least one faulty sector set whose number of faulty sectors is greater than a second threshold number is determined as the second sector set fault type.
23. A computer program product tangibly stored on a non-transient computer-readable medium and comprising machine-executable instructions that, when executed, cause a machine to perform the steps of the method according to any one of claims 1 to 7.
24. A computer program product tangibly stored on a non-transient computer-readable medium and comprising machine-executable instructions that, when executed, cause a machine to perform the steps of the method according to any one of claims 8 to 11.