A hard disk garbage collection method and device, electronic equipment and storage medium

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By dividing storage objects into fixed intervals and implementing differentiated read strategies, the performance degradation problem of traditional hard disk garbage collection methods under limited read IOPS and bandwidth is solved, thereby improving the efficiency of hard disk garbage collection and the stability of storage performance.

CN121996568BActive Publication Date: 2026-06-23CHINA ELECTRONICS CLOUD DIGITAL INTELLIGENCE TECH CO LTD

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: CHINA ELECTRONICS CLOUD DIGITAL INTELLIGENCE TECH CO LTD
Filing Date: 2026-04-07
Publication Date: 2026-06-23

Application Information

Patent Timeline

07 Apr 2026

Application

23 Jun 2026

Publication

CN121996568B

IPC: G06F12/02; G06F3/06

AI Tagging

Application Domain

Input/output to record carriers Memory adressing/allocation/relocation

Technology Topics

Computer hardware Parallel computing

Technical Efficacy Phrases

Reduce the frequency of read requestsrelieve pressure

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Front suspension under shield and vehicle
CN224349018Uavoid turbulenceEasy to move Vehicle body streamlining
A method for operating a building central smoke exhaust system
CN117287729BDomestic stoves or ranges Lighting and heating apparatus
Magnetic separation device and method for construction mixed waste of steel structure
CN122230879AUninterrupted separation operationImprove separation efficiencyMagnetic separation
Wall-mounted operating device structure
CN224397505Urelieve pressure reduce risk Stands/trestles Structural engineering Control theory
A waste collecting device for a packaging bag production
CN224394158USave disassembly timeshorten the time Webs handling Structural engineering Waste material

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Traditional hard drive garbage collection methods struggle to achieve optimal efficiency when read IOPS and bandwidth are limited, leading to performance degradation and RPC blocking. In particular, read efficiency in mechanical hard drives is affected by head seek time.

Method used

A disk garbage collection method based on intelligent pre-reading is adopted, which divides the storage objects into continuous intervals of fixed size, and further divides them into single interval, cross-double interval and cross-multi-interval scenarios according to the data location relationship, and executes differentiated read strategies, including cache processing and IOPS concurrency control.

Benefits of technology

It significantly reduces the frequency of disk read requests, alleviates the pressure on the read/write head, improves read performance, reduces I/O flow control conflicts, improves GC recycling efficiency, and ensures the stability of cluster storage capacity.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN121996568B_ABST

Patent Text Reader

Abstract

The application discloses a hard disk garbage collection method and device. The method comprises the following steps: dividing a storage object to be collected into a plurality of fixed-size continuous intervals, determining the starting and ending intervals of the data to be migrated according to the offset and length of the data to be migrated; dividing a reading scene into three categories, i.e., a single-interval, a cross-double-interval and a cross-multiple-interval, based on the interval position relationship; in the single-interval scene, if the interval has been cached, the data is directly copied, otherwise, the complete interval data is read to the cache when the reading IOPS does not reach the upper limit; the same logic is executed for each interval in the cross-double-interval scene; in the cross-multiple-interval scene, the data is directly read and the IOPS concurrency is converted according to the actual length; and the effective data is migrated to a new storage object and the original object space is released. The application effectively reduces the number of hard disk reading requests, reduces the mechanical disk head addressing overhead, simultaneously reduces the IOPS flow control conflict by using cache hits, and significantly improves the garbage collection efficiency of the storage system and the stability of the cluster capacity.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application belongs to the field of hard disk cleaning technology, and specifically relates to a hard disk junk recycling method, apparatus, computer-readable storage medium, and electronic device. Background Technology

[0002] In storage systems, to fully utilize disk write performance, the industry commonly employs Redirect-on-Write (ROW) technology. This technology aggregates scattered write requests into large-scale sequential I / O operations before writing them to pre-allocated objects in the disk pool. In distributed storage architectures, append-only writes can significantly improve write throughput, hence ROW technology also adopts this mechanism. However, append-only writes inevitably generate garbage data versions when updating the same data block multiple times, requiring garbage collection (GC) to release invalid space. Given that ROW typically manages large objects (e.g., a single object of 128MB), conventional GC strategies scan the amount of garbage within objects and perform data migration on objects whose garbage percentage exceeds a preset threshold: that is, reading valid data from the source object, writing it to the newly allocated object, and then deleting the source object to reclaim storage space.

[0003] However, GC relocation efficiency is limited by read I / O performance constraints: excessively high read IOPS can easily lead to disk overload, resulting in performance degradation or even RPC blocking; while excessively low read IOPS cannot fully utilize the advantages of concurrency. Furthermore, for HDDs, head seek time significantly impacts read efficiency; with the same data volume, a single large I / O sequential read performs significantly better than multiple small I / O random reads. Therefore, in scenarios where both read IOPS and bandwidth are limited, traditional GC methods struggle to achieve optimal efficiency. Summary of the Invention

[0004] To address the aforementioned problems in the existing technology, this application proposes a novel hard disk garbage collection method based on intelligent pre-reading, aiming to improve the efficiency of hard disk garbage collection.

[0005] Specifically, this application provides the following technical solutions:

[0006] A first aspect of this application provides a hard disk garbage collection method, the method comprising:

[0007] S1. Divide the storage objects to be reclaimed into multiple consecutive intervals of fixed size;

[0008] S2. Determine the start and end intervals occupied by the data based on the offset and length of the data to be migrated in the storage object.

[0009] S3. Based on the positional relationship between the starting and ending intervals, the data reading scenarios are divided into three types: single-interval scenarios, cross-double-interval scenarios, and cross-multi-interval scenarios, and differentiated reading strategies are implemented for each type:

[0010] (1) For a single-range scenario, if the range has been read into the cache, the data corresponding to the range is copied directly from the cache; if it has not been read, the complete data of the range is read from the hard disk and loaded into the cache when the read IOPS concurrency has not reached the upper limit, while the IOPS concurrency is incremented. After the read is completed, the IOPS concurrency is decremented and the range is marked as read; if the read IOPS concurrency has reached the upper limit, the read and cache processing logic is executed after the read IOPS concurrency is released.

[0011] (2) For the scenario spanning two intervals, the same reading and caching processing logic as the single interval scenario is executed for each interval until the data reading of both intervals is completed;

[0012] (3) For scenarios spanning multiple intervals, data is read directly from the hard drive, and flow control is performed by calculating the IOPS concurrency based on the actual length of the data, without enabling the interval caching mechanism;

[0013] S4. Based on the valid data read, migrate it to a new storage object and release the space occupied by the original storage object.

[0014] Furthermore, in the method of this application, the size of the continuous interval in step S1 is fixed at 1MB per interval, and the size of the storage object is 128MB.

[0015] Furthermore, in the method of this application, the calculation method for the starting interval in step S2 is a floor function: starting interval = Offset / Interval Size The ending interval is calculated using the floor function: ending interval = (Offset + Data length - 1) / Interval size .

[0016] Furthermore, in the method of this application, the judgment condition for the single interval scenario in step S3 is that the starting interval equals the ending interval, the judgment condition for the cross-double interval scenario is that the ending interval minus the starting interval equals 1, and the judgment condition for the cross-multi-interval scenario is that the ending interval minus the starting interval is greater than 1.

[0017] Furthermore, in the method of this application, the IOPS concurrency calculation method for the multi-interval scenario mentioned in step S3 is a floor function: the increased concurrency = Actual data length / interval size .

[0018] Furthermore, in the method of this application, before performing step S1, it further includes: determining the storage object to be garbage collected, wherein the storage object stores user data and corresponding metadata, and the metadata includes the offset of the user data in the storage object and the data length;

[0019] Before executing step S3, the method further includes: retrieving the metadata information corresponding to the user data stored in the storage object back to the index pool and comparing it with the metadata in the storage object to determine the validity of the data.

[0020] Furthermore, the method of this application is applied to garbage collection of disk pools employing write-on-redirection technology in distributed storage systems.

[0021] A second aspect of this application provides a hard disk garbage collection apparatus, wherein the apparatus, when operating, implements the steps of the aforementioned hard disk garbage collection method, the apparatus comprising:

[0022] The interval segmentation module is used to divide the storage objects to be recycled into multiple continuous intervals of fixed size;

[0023] The interval positioning module is used to determine the start and end intervals occupied by the data based on the offset and length of the data to be migrated in the storage object.

[0024] The read strategy execution module is used to classify the data read scenario into three types based on the positional relationship between the start interval and the end interval: single interval scenario, cross-double interval scenario, and cross-multiple interval scenario, and to execute differentiated read strategies for different types:

[0025] (1) For a single-range scenario, if the range has been read into the cache, the data corresponding to the range is copied directly from the cache; if it has not been read, the complete data of the range is read from the hard disk and loaded into the cache when the read IOPS concurrency has not reached the upper limit, while the IOPS concurrency is incremented. After the read is completed, the IOPS concurrency is decremented and the range is marked as read; if the read IOPS concurrency has reached the upper limit, the read and cache processing logic is executed after the read IOPS concurrency is released.

[0026] (2) For the scenario spanning two intervals, the same reading and caching processing logic as the single interval scenario is executed for each interval until the data reading of both intervals is completed;

[0027] (3) For scenarios spanning multiple intervals, data is read directly from the hard drive, and flow control is performed by calculating the IOPS concurrency based on the actual length of the data, without enabling the interval caching mechanism;

[0028] The data migration module is used to migrate valid data to a new storage object and release the space occupied by the original storage object.

[0029] Furthermore, in the device of this application, the size of the continuous interval is fixed at 1MB per interval, and the size of the storage object is 128MB.

[0030] Furthermore, the device in this application also includes a metadata verification module, which is used to reverse query the data metadata information stored in the storage object to the index pool for data validity verification.

[0031] A third aspect of this application provides an electronic device, including: a memory and a processor;

[0032] Memory: Used to store computer programs;

[0033] Processor: Used to execute the computer program to implement the steps of the aforementioned hard disk garbage collection method.

[0034] A fourth aspect of this application provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps of the aforementioned hard disk garbage collection method.

[0035] In summary, by introducing a (1MB) prefetch mechanism, this invention can significantly reduce the frequency of disk read requests and read IOPS pressure, alleviate the seek pressure of mechanical hard disk heads, reduce the overhead of head reciprocating addressing, and thus improve read performance. At the same time, relying on the (1MB) cache hit strategy, it effectively reduces I / O flow control conflicts, improves GC reclamation efficiency, and ensures the stability of cluster storage capacity.

[0036] Other features and advantages of this application will be set forth in detail in the following description, or will become apparent through the implementation of the relevant technical solutions of this application. The objectives and other advantages of this application can be achieved through the technical features and means explicitly pointed out in the description, claims, and drawings, and will be obtained through the implementation of these technical contents. Attached Figure Description

[0037] To more clearly illustrate the technical solution of this application, the accompanying drawings involved in the description of this invention will be briefly introduced below. It should be noted that the drawings only show some embodiments of the invention. For those skilled in the art, other related drawings can be derived from these drawings without creative effort.

[0038] Figure 1 This is a flowchart illustrating the overall implementation process of the hard disk garbage collection method described in this application.

[0039] Figure 2This is a diagram of the cluster topology involved in the embodiments of this application (this example is a 3-node cluster, with 4 disks under each node).

[0040] Figure 3 This is a flowchart illustrating the GC space reclamation process for storage objects in an embodiment of this application.

[0041] Figure 4 This is a structural diagram of the hard disk waste recycling device of this application.

[0042] Figure 5 This is a schematic diagram of the structure of an electronic device provided in an embodiment of this application. Detailed Implementation

[0043] To make the objectives, technical solutions, and advantages of the embodiments of this application clearer, the technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. It should be noted that the described embodiments are only some embodiments of this application, and not all embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of this application without creative effort are within the protection scope of this application.

[0044] In this document, the term "comprising" and any variations thereof (such as "including," "including," etc.) are open-ended expressions and should be understood as "including but not limited to," meaning that the listed content is not exhaustive and may include other content not explicitly mentioned. The term "based on" should be understood as "at least partially based on," meaning that the basis or condition referred to may not be the only factor and may involve other relevant factors. The term "one embodiment" should be understood as "at least one embodiment," meaning that the described embodiment is not the only possible implementation, and other similar embodiments may exist.

[0045] In this application, the terms "a" and "a plurality of" are used to modify related elements or features, and their expression is illustrative rather than restrictive. Unless otherwise expressly stated in the context, "a" should be understood as "at least one," and "a plurality of" should be understood as "at least two." Those skilled in the art should reasonably interpret these terms based on the semantic and logical relationships of the context to ensure that they cover the possibility of "one or more."

[0046] Example: A method for hard drive garbage collection

[0047] Figure 1 The diagram shows the overall implementation process of the hard disk garbage collection method provided in this application, including the following steps:

[0048] S1. Divide the storage objects to be reclaimed into multiple consecutive intervals of fixed size;

[0049] S2. Determine the start and end intervals occupied by the data based on the offset and length of the data to be migrated in the storage object.

[0050] S3. Based on the positional relationship between the starting and ending intervals, the data reading scenarios are divided into three types: single-interval scenarios, cross-double-interval scenarios, and cross-multi-interval scenarios, and differentiated reading strategies are implemented for each type:

[0051] (1) For a single-range scenario, if the range has been read into the cache, the data corresponding to the range is copied directly from the cache; if it has not been read, the complete data of the range is read from the hard disk and loaded into the cache when the read IOPS concurrency has not reached the upper limit, while the IOPS concurrency is incremented. After the read is completed, the IOPS concurrency is decremented and the range is marked as read; if the read IOPS concurrency has reached the upper limit, the read and cache processing logic is executed after the read IOPS concurrency is released.

[0052] (2) For the scenario spanning two intervals, the same reading and caching processing logic as the single interval scenario is executed for each interval until the data reading of both intervals is completed;

[0053] (3) For scenarios spanning multiple intervals, data is read directly from the hard drive, and flow control is performed by calculating the IOPS concurrency based on the actual length of the data, without enabling the interval caching mechanism;

[0054] S4. Based on the valid data read, migrate it to a new storage object and release the space occupied by the original storage object.

[0055] To more clearly illustrate the technical solution of this application, the following will provide further explanation through specific scenario embodiments.

[0056] Assuming a 3-node cluster, with each node having 4 disks, the cluster topology is as follows: Figure 2As shown in the figure, the index pool is mainly used to store metadata information corresponding to user data, while the data pool is used to store the actual user data; user data (single value, i.e., sv or array value, where array values are generally described in a fixed size, such as 512 bytes, e.g., [0,3], a total of 4 groups * 512 = When writing (2KB), it is first mapped to an index object, which is then mapped to the corresponding master node. For example, the master node for index_objA1 / B1 / C1 is node0, the master node for index_objA2 / B2 / C2 is node1, and the master node for index_objA3 / B3 / C13 is node2. Then, on the master node, after aggregation writing, each storage object is requested. Assuming 3-replica redundancy, the requested storage objects are obj0 (tgt0, tgt4, tgt8), obj1 (tgt1, tgt5, tgt9), and obj2 (tgt2, tgt5, tgt9). That is, the write requests of A1 / B1 / C1 are aggregated together through obj0, the write requests of A2 / B2 / C2 are aggregated together through obj1, and the write requests of A3 / B3 / C3 are aggregated together through obj3. After aggregation, the metadata information of the index object (such as the index container `cont`, the index obj object ` / dkey / akey / recx`, and the backend storage address) is recorded and stored in the metadata space of the index pool. Simultaneously, it is recorded in `obj0` to locate its storage location. When this object repeatedly overwrites new `obj` storage objects, the data in the old `obj` storage object becomes garbage data. When the amount of garbage in the storage object reaches the GC reclamation threshold (e.g., a static threshold of 80%), the garbage space can be reclaimed using the GC garbage collection algorithm. Specifically, the index metadata information recorded in the `obj` storage object is read and used for reverse lookup. The reverse lookup result, i.e., the metadata information of the index object (such as the index container `cont`, the index obj object ` / dkey / akey / recx`, and the storage object ID), is compared with the information stored on the storage object to determine whether the data is garbage or valid data. If it is valid data, it needs to be moved to a new storage object. Since data migration (reading from disk first and then writing to disk) involves reading and writing to the disk, it is limited by IOPS concurrency. If disk reads are too frequent, it will cause excessive load on the disk, resulting in a backlog of read and write requests and reducing disk read and write efficiency. If IOPS is too low, it will affect GC space reclamation efficiency.

[0057] To address the aforementioned problems, this invention proposes a method for improving garbage collection efficiency based on a pre-read cache (e.g., 1MB). Figure 3The diagram shows the GC space reclamation process for storage objects in this solution. Its core steps include:

[0058] Step 1: Divide the object into 1M intervals. For example, a large object of 128M will be divided into 128 consecutive intervals.

[0059] Step 2: Calculate the 1M starting interval (calculated as OFFSET / 1M) to which the data belongs based on the offset recorded on the object in the metadata, and denot it as start; calculate the 1M ending interval to which the data belongs based on the length of the metadata, and denot it as end.

[0060] Step 3: If start = end, it means the data falls within the same 1M interval. Use the 1M interval reading indicator to check if the 1M interval to which the record belongs has been successfully read:

[0061] (1) If it is marked as not read, then determine whether the current read IOPS concurrency exceeds the limit;

[0062] If the upper limit is not exceeded, the data in the 1M range will be read directly from the disk, and the IOPS concurrent operation will be incremented by 1. After the reading is completed, the IOPS concurrent operation will be decremented by 1, and the reading status of the 1M range will be set to read.

[0063] If the upper limit is exceeded, wait for the IOPS to be released before initiating the disk read of the 1M range of data;

[0064] (2) If it is marked as having been successfully read, then copy the corresponding number directly from the current 1M interval cache.

[0065] Step 4: If end - start = 1, it means the data spans 1M intervals and falls exactly within two consecutive 1M intervals. In this case:

[0066] (1) Read the first 1M interval data: Determine whether the first 1M interval has been read. If it has been read, copy the data falling in the 1M interval; if it has not been read, execute the reading process described in step 3.

[0067] (2) Read the second 1M interval data according to the process in step (1);

[0068] (3) Once both intervals of data have been read, the data reading is complete.

[0069] Step 5: If end-start > 1, it means that the data spans multiple 1M intervals and is a relatively large read. In this case, it is read directly from disk and no longer split according to the 1M cache. The IOPS concurrency limit is calculated by rounding down from the actual length of the data / 1M. For example, if you want to read 3.5M of data, the concurrency is added according to 3.5M / 1M = 3 when reading, and the concurrency is subtracted according to 3 after reading is completed.

[0070] Figure 4 The image shows a hard disk waste recycling device according to this application, the device comprising:

[0071] The interval segmentation module is used to divide the storage objects to be recycled into multiple continuous intervals of fixed size;

[0072] The interval positioning module is used to determine the start and end intervals occupied by the data based on the offset and length of the data to be migrated in the storage object.

[0073] The read strategy execution module is used to classify the data read scenario into three types based on the positional relationship between the start interval and the end interval: single interval scenario, cross-double interval scenario, and cross-multiple interval scenario, and to execute differentiated read strategies for different types:

[0074] (1) For a single-range scenario, if the range has been read into the cache, the data corresponding to the range is copied directly from the cache; if it has not been read, the complete data of the range is read from the hard disk and loaded into the cache when the read IOPS concurrency has not reached the upper limit, while the IOPS concurrency is incremented. After the read is completed, the IOPS concurrency is decremented and the range is marked as read; if the read IOPS concurrency has reached the upper limit, the read and cache processing logic is executed after the read IOPS concurrency is released.

[0075] (2) For the scenario spanning two intervals, the same reading and caching processing logic as the single interval scenario is executed for each interval until the data reading of both intervals is completed;

[0076] (3) For scenarios spanning multiple intervals, data is read directly from the hard drive, and flow control is performed by calculating the IOPS concurrency based on the actual length of the data, without enabling the interval caching mechanism;

[0077] The data migration module is used to migrate valid data to a new storage object and release the space occupied by the original storage object.

[0078] The above-mentioned device implements the steps of the hard disk garbage collection method disclosed in this application when it is in operation.

[0079] The flowcharts and block diagrams in the accompanying drawings illustrate possible implementations of apparatus, methods, and computer program products according to various embodiments of this application, including architecture, functionality, and operation. In these figures, each block may represent a module, program segment, or portion of code containing one or more executable instructions for implementing a specified logical function. It should be noted that each block in the block diagrams and / or flowcharts, and combinations thereof, can be implemented using either a dedicated hardware-based system or a combination of dedicated hardware and computer instructions to achieve the specified function or operation.

[0080] like Figure 5 As shown in the illustration, an embodiment of this application also discloses an electronic device, including: a processor 310, a communication interface 320, a memory 330 for storing a processor-executable computer program, and a communication bus 340. The processor 310, communication interface 320, and memory 330 communicate with each other via the communication bus 340. The processor 310 executes the executable computer program to implement the steps of the aforementioned hard disk garbage collection method.

[0081] It is understood that, in addition to memory and a processor, this electronic device may also include input devices (such as a keyboard), output devices (such as a display), and other communication modules. These input devices, output devices, and other communication modules all communicate with the processor through I / O interfaces (i.e., input / output interfaces).

[0082] The operations described in this application can be implemented by writing computer program code using one or more programming languages or a combination thereof. The programming languages include, but are not limited to, the following types:

[0083] Object-oriented programming languages, such as Java, Smalltalk, C++, etc.

[0084] Conventional procedural programming languages, such as "C" or similar programming languages.

[0085] The execution methods of program code include, but are not limited to:

[0086] It runs entirely on the user's computer;

[0087] Part of it executes on the user's computer, and part of it executes on a remote computer;

[0088] Execute as a standalone software package;

[0089] It is executed entirely on a remote computer or server.

[0090] In scenarios involving remote computers, the remote computer can connect to the user's computer via any type of network, including but not limited to local area networks (LANs) or wide area networks (WANs). Furthermore, the remote computer can also connect to external computers through an internet service provider, for example, by utilizing the internet for connection.

[0091] Furthermore, this application also discloses a computer-readable storage medium, wherein when the instructions in the computer-readable storage medium are executed by a processor of an electronic device, the electronic device is able to perform the various steps of the hard disk garbage collection method disclosed in this application.

[0092] In the context of this application, a computer-readable storage medium refers to a tangible medium capable of storing computer program code and related data. Specific examples include, but are not limited to, the following:

[0093] (1) Portable computer disk: such as floppy disks and other removable magnetic storage media.

[0094] (2) Hard disk: including mechanical hard disks and solid-state hard disks and other fixed storage devices.

[0095] (3) Random Access Memory (RAM): A volatile storage medium used for temporary storage of data and program code.

[0096] (4) Read-only memory (ROM): a non-volatile storage medium used to store fixed programs and data.

[0097] (5) Erasable programmable read-only memory (EPROM) or flash memory: non-volatile storage media that supports multiple erasures and reprogrammings.

[0098] (6) Fiber optic storage devices: storage media based on fiber optic technology.

[0099] (7) Portable compact disc read-only memory (CD-ROM): a read-only medium that stores data in the form of an optical disc.

[0100] (8) Optical storage devices: such as DVDs, Blu-ray discs and other storage media based on optical principles.

[0101] (9) Magnetic storage devices: such as magnetic tapes, disks and other storage media based on magnetic principles.

[0102] (10) Any suitable combination of the above: for example, combining multiple storage media to meet different storage needs.

[0103] These computer-readable storage media can be used to store the program code and related data described in this application to support program execution and persistent data storage.

[0104] Specifically, according to embodiments of this application, the processes described in the flowcharts can be implemented as computer software programs. For example, embodiments of this application relate to a computer program product comprising a computer program carried on a non-transitory computer-readable medium. This computer program contains program code for executing the hard disk garbage collection method disclosed in this application. When the computer program is executed by a processing device, it can achieve the functions defined in the embodiments of this application.

[0105] While the foregoing discussion contains several specific implementation details, these details should not be construed as limiting the scope of this application. The above description is merely a preferred embodiment of this application and an explanation of the technical principles employed. Those skilled in the art should understand that the scope of this application is not limited to technical solutions formed by specific combinations of the above-described technical features. Furthermore, this application should also cover other technical solutions formed by any combination of the above-described technical features or their equivalents without departing from the foregoing disclosed concept.

[0106] Those skilled in the art should also understand that modifications can be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features, without departing from the spirit and scope of the technical solutions of the embodiments of this application. These modifications or substitutions will not cause the essence of the corresponding technical solutions to deviate from the core spirit and scope of the technical solutions of the embodiments of this application.

Claims

1. A method for hard drive garbage collection, characterized in that, The method includes: S1. Divide the storage objects to be reclaimed into multiple consecutive intervals of fixed size; S2. Based on the offset and length of the data to be migrated within the storage object, determine the starting and ending intervals occupied by the data; the starting interval is calculated using a floor function: starting interval = Offset / Interval Size The ending interval is calculated using the floor function: ending interval = (Offset + Data Length - 1) / Interval Size ; S3. Based on the positional relationship between the starting and ending intervals, the data reading scenarios are divided into three types: single-interval scenarios, cross-double-interval scenarios, and cross-multi-interval scenarios. The judgment condition for a single-interval scenario is that the starting interval equals the ending interval; the judgment condition for a cross-double-interval scenario is that the ending interval minus the starting interval equals 1; and the judgment condition for a cross-multi-interval scenario is that the ending interval minus the starting interval is greater than 1. Differentiated reading strategies are then implemented for each type. (1) For a single-range scenario, if the range has been read into the cache, the data corresponding to the range is copied directly from the cache; if it has not been read, the complete data of the range is read from the hard disk and loaded into the cache when the read IOPS concurrency has not reached the upper limit, while the IOPS concurrency is incremented. After the read is completed, the IOPS concurrency is decremented and the range is marked as read; if the read IOPS concurrency has reached the upper limit, the read and cache processing logic is executed after the read IOPS concurrency is released. (2) For the scenario spanning two intervals, the same reading and caching processing logic as the single interval scenario is executed for each interval until the data reading of both intervals is completed; (3) For scenarios spanning multiple intervals, data is read directly from the hard drive, and flow control is performed by calculating the IOPS concurrency based on the actual length of the data, without enabling the interval caching mechanism; S4. Based on the valid data read, migrate it to a new storage object and release the space occupied by the original storage object.

2. The method according to claim 1, characterized in that, In step S1, the size of the continuous interval is fixed at 1MB per interval, and the size of the storage object is 128MB.

3. The method according to claim 1, characterized in that, The IOPS concurrency calculation method for multi-interval scenarios mentioned in step S3 is a floor function: the increase in concurrency = Actual data length / interval size .

4. The method according to claim 1, characterized in that, Before executing step S1, the method further includes: determining the storage object to be garbage collected, wherein the storage object stores user data and corresponding metadata, and the metadata includes the offset of the user data in the storage object and the data length; Before executing step S3, the method further includes: pre-searching the metadata information corresponding to the user data stored in the storage object back to the index pool, and comparing it with the metadata in the storage object to determine the validity of the data.

5. The method according to claim 1, characterized in that, The method is applied to garbage collection of disk pools using write-on-redirection technology in distributed storage systems.

6. A hard drive waste recycling device, characterized in that, The device includes: The interval segmentation module is used to divide the storage objects to be recycled into multiple continuous intervals of fixed size; The interval positioning module is used to determine the start and end intervals occupied by the data based on the offset and length of the data to be migrated within the storage object; the start interval is calculated using a floor function: start interval = Offset / Interval Size The ending interval is calculated using the floor function: ending interval = (Offset + Data Length - 1) / Interval Size ; The reading strategy execution module is used to classify the data reading scenario into three types based on the positional relationship between the start interval and the end interval: single-interval scenario, cross-double-interval scenario, and cross-multi-interval scenario. The judgment condition for the single-interval scenario is that the start interval equals the end interval; the judgment condition for the cross-double-interval scenario is that the end interval minus the start interval equals 1; and the judgment condition for the cross-multi-interval scenario is that the end interval minus the start interval is greater than 1. Different reading strategies are then executed for each type. (1) For a single-range scenario, if the range has been read into the cache, the data corresponding to the range is copied directly from the cache; if it has not been read, the complete data of the range is read from the hard disk and loaded into the cache when the read IOPS concurrency has not reached the upper limit, while the IOPS concurrency is incremented. After the read is completed, the IOPS concurrency is decremented and the range is marked as read; if the read IOPS concurrency has reached the upper limit, the read and cache processing logic is executed after the read IOPS concurrency is released. (2) For the scenario spanning two intervals, the same reading and caching processing logic as the single interval scenario is executed for each interval until the data reading of both intervals is completed; (3) For scenarios spanning multiple intervals, data is read directly from the hard drive, and flow control is performed by calculating the IOPS concurrency based on the actual length of the data, without enabling the interval caching mechanism; The data migration module is used to migrate valid data to a new storage object and release the space occupied by the original storage object.

7. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by the processor, it implements the steps of the hard disk garbage collection method as described in any one of claims 1-5.

8. An electronic device, characterized in that, include: Memory and processor; Memory: Used to store computer programs; Processor: for executing the computer program to implement the steps of the hard disk garbage collection method as described in any one of claims 1-5.