Efficient clean-up / defragmentation mechanism for expired retention lock (compliance and governance) segments in deduplicated cloud objects

By separating and organizing expired and active segments in cloud objects based on file retention time and segment state, the fragmentation problem caused by the mixing of expired retained locked segments and active locked segments is solved, reducing storage costs and improving space utilization.

CN114372022BActive Publication Date: 2026-06-19EMC IP HLDG CO LLC

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
EMC IP HLDG CO LLC
Filing Date
2021-10-14
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

In existing deduplicated cloud objects, the mixture of expired retained lock segments and active lock segments leads to fragmentation, which the garbage collector cannot clean up, increasing storage costs and space usage.

Method used

By separating and organizing expired and active segments in cloud objects based on file retention time and segment locking status, new objects are created and expired segments are deleted, thus freeing up storage space.

🎯Benefits of technology

It achieves efficient defragmentation of objects in cloud storage, reducing storage costs and improving storage space utilization.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN114372022B_ABST
    Figure CN114372022B_ABST
Patent Text Reader

Abstract

This invention relates to the cleaning and defragmentation of deduplicated and locked data. An example method includes: identifying a cloud object as a potential candidate for defragmentation; evaluating the cloud object to determine which portion of segments of the cloud object has expired; separating the expired and non-expired segments of the cloud object when the expired segments of the portion meet or exceed a threshold; creating a first new cloud object comprising only the non-expired segments; creating a second new cloud object comprising only the expired segments; and deleting the cloud object from a storage device.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] Embodiments of the present invention generally relate to the cleaning and defragmentation of deduplicated and retained locked data. More specifically, at least some embodiments of the present invention relate to systems, hardware, software, computer-readable media, and methods for cleaning and defragmenting expired retained locked segments in storage environments including cloud storage. Background Technology

[0002] Many enterprises use deduplication applications for backup and archiving. These applications allow for retention locking of backup files locally for protection and / or compliance. Furthermore, they allow backup files to be moved to cloud storage or cloud space environments for long-term retention. Deduplication applications can also provide retention locking protection for these moved deduplicated objects in cloud storage, where objects are locked for a specific duration using the cloud provider's retention locking API. It should be noted that these deduplicated cloud objects contain a set of data segments shared by one or more backup files. Over time, with numerous locking, recovery, and deletion operations, cloud space will become fragmented with a mixture of deduplicated objects containing expired RL (Retention Lock) segments and active RL segments. Garbage collectors or cleanup processes cannot delete or clean up such objects because they still contain one or more active RL segments shared by one or more lock files and are therefore still locked in the cloud and cannot be deleted. Attached Figure Description

[0003] To describe the manner in which at least some of the advantages and features of the invention can be obtained, a more specific description of embodiments of the invention will be presented by reference to the specific embodiments illustrated in the accompanying drawings. It should be understood that these drawings illustrate only typical embodiments of the invention and should therefore not be considered as limiting its scope; embodiments of the invention will be described and explained in more specific and detailed manner through the use of these drawings.

[0004] Figure 1 Various aspects of the example operating environment have been disclosed.

[0005] Figure 2 Some example cloud objects and aspects of their segments with retention durations have been disclosed.

[0006] Figure 3 Example methods involving RLG locking have been disclosed.

[0007] Figure 3A An example segment distribution across newly created objects is disclosed after the defragmentation process and its RLG locking duration.

[0008] Figure 4A Example methods involving RLC locking have been disclosed.

[0009] Figure 4B An example garbage collection method has been published.

[0010] Figure 4C An example segment distribution across newly created objects is disclosed after the defragmentation process and its RLC locking duration.

[0011] Figure 5 Sample retention locking information was exposed in the object and segment metadata (uploaded).

[0012] Figure 6 Various aspects of the example computed entity have been disclosed. Detailed Implementation

[0013] Embodiments of the present invention generally relate to cleaning and defragmenting data to remove duplicates or "deduplicatize" data. More specifically, at least some embodiments of the present invention relate to systems, hardware, software, computer-readable media, and methods for cleaning and defragmenting expired reserved lock segments in storage environments such as cloud storage.

[0014] In one example implementation, defragmentation of deduplicated objects is performed based on the file's retention time and therefore on the file's segments, causing the entire object to expire rather than causing fragmentation of active and expired segments within the object. Such an implementation can be achieved by combining one or both of RLG (Reserved Locking, Control) objects and RLC (Reserved Locking, Compliance) objects.

[0015] More specifically, the example implementation can selectively handle fragmented deduplicated objects in the cloud, i.e., objects containing a mixture of both expired RL segments and active RL segments. These objects can be defragmented by separating their constituent segments into different objects based on various criteria specific to retention locking (RL). Such criteria may include, for example, the object's locking status, the minimum / maximum retention period for all files corresponding to the segment, and the type of lock implemented on the object. For the latter criterion, at least two different locks, namely RLC and RLG, can be employed in the example implementation. It is worth noting that RLC locks, once set, cannot be recovered or revoked.

[0016] After defragmenting an object, a new defragmented and partitioned object can be created, and the duration or retention time of the new object can be determined based on the locking type used by that object. Any expired segments from a defragmented object can be combined to form an expired, unlocked object, which can then be deleted by the garbage collector, thereby freeing up the storage space previously occupied by the expired segments.

[0017] Embodiments of the present invention, such as those disclosed herein, may be advantageous in several respects. For example, and as will be apparent from this disclosure, one or more embodiments of the invention can provide one or more advantageous and unexpected effects in any combination, some examples of which are set forth below. It should be noted that such effects are neither intended nor should be construed as limiting the scope of the claimed invention in any way. It should also be noted that nothing herein should be construed as constituting a necessary or indispensable element of any invention or embodiment. Rather, aspects of the disclosed embodiments can be combined in various ways to define further embodiments. Such further embodiments are considered to be within the scope of this disclosure. Similarly, embodiments included within the scope of this disclosure should not be construed as solving or limited to solving any particular problem. Nor should any such embodiments be construed as achieving or limited to achieving any particular technical effect or solution. Finally, no embodiment is required to achieve any of the advantageous and unexpected effects disclosed herein.

[0018] In particular, an advantageous aspect of at least some embodiments of the present invention is the ability to perform cleanup / fragmentation on objects comprising a mixture of expired and active segments. In one embodiment, a method may be employed to perform defragmentation on deduplicated objects based on the retention time of the deduplicated segments in the referenced object, such that the entire object expires simultaneously. In another embodiment, the object is processed according to one or both of RLC locking and RLG locking.

[0019] A. Overview

[0020] Deduplicating or "de-duplication" systems that extend retention capabilities from on-premises to the cloud can protect data in two ways: when data is on-premises and when data is off-premises, such as in a cloud storage environment. In this way, data can be protected both on the file system (FS) end and in the cloud. Such systems can provide retention capabilities for deduplicating data between files with RLG (Regulation Level) and RLC (Compliance Level) classifications. Specifically, such systems can provide retention capabilities by effectively locking cloud objects using cloud provider APIs (Application Programming Interfaces) and intelligently managing deduplicated segments within objects using different levels of retention time for segments.

[0021] It should be noted that efficient deduplication systems may not store segments directly on-premises or in the cloud, but instead package a group of one or more segments within a container object. Various principles may support this approach. For example, the segment size used in some deduplication systems (e.g., approximately 2K-12K) can directly impact the overall deduplication rate of the system. As another example, relatively small segment sizes can increase TCO (Total Cost of Ownership) because managing a large number of such small objects requires more transactions, while processing a packaged container object (which may contain multiple segments) would require relatively fewer transactions. As a final example of the underlying principles of using packaged container objects, the use of segment-level processing and transactions can result in the generation and processing of relatively more metadata that needs to be managed in the cloud, which can lead to problems in the cloud backend, such as slower object lookups, whereas a single container object containing multiple segments may have fewer requirements in terms of metadata generation and processing.

[0022] Finally, because files or objects containing cloud segments are deleted, those cloud segments may expire after the lock expires or is restored, or after the file is recalled locally. However, typical garbage collectors and garbage collection processes cannot delete cloud objects that contain a mixture of active or valid segments and inactive or expired segments, because the objects remain locked due to the presence of valid segments. Therefore, the number of objects with such a mixture of segments can increase over time, and the unnecessary space occupied by expired segments within locked objects also increases the storage costs for the data owner's cloud.

[0023] This problem, which involves objects containing both expired and non-expired segments, can occur with and without RL locking. A typical approach to this situation is to simply wait for all segments in the object to expire. However, using this approach can increase TCO because it may retain objects with a mix of valid and expired segments for a relatively long time, requiring more storage and resulting in longer storage periods.

[0024] Therefore, exemplary embodiments of the present invention may include, in particular, a method for defragmenting an object based on the retention time of the file and thus on the retention time of its constituent segments, causing the entire object to expire. That is, all segments of the object expire simultaneously, thereby avoiding fragmentation involving valid and expired segments within the object.

[0025] B. Aspects of the Example Architecture and Environment

[0026] The following is a discussion of various aspects of example operating environments for different embodiments of the present invention. This discussion is not intended to limit the scope of the invention or the applicability of the embodiments in any way.

[0027] Generally, embodiments of the present invention can be implemented in combination with systems, software, and components that individually and / or collectively implement and / or cause cleanup and defragmentation operations, which may also be collectively referred to as garbage collection operations. More generally, the scope of the present invention includes any operating environment in which the disclosed concepts can be used.

[0028] Embodiments of this invention can be used in conjunction with cloud storage environments and / or cloud computing environments. Example cloud storage environments include data protection environments that can take the form of public or private cloud storage environments, local storage environments, and hybrid storage environments that include both public and private elements. Any of these storage environments can store, for example, new and / or modified data collected and / or generated by one or more clients or other settings within an enterprise.

[0029] Similarly, any of these example storage environments can be partially or fully virtualized. A storage environment may include or consist of data centers operable to serve read, write, delete, backup, restore, and / or cloning operations initiated by one or more clients or other elements of the operating environment. Where backups comprise data groups with distinct characteristics, the data can be assigned and stored to distinct targets within the storage environment, each target corresponding to a data group with one or more specific characteristics.

[0030] Examples of common cloud computing environments may or may not be publicly available, including cloud environments in which processing, data protection, and / or other services can be performed on behalf of one or more clients. Some example cloud computing environments in which embodiments of the present invention may be used include, but are not limited to, Dell EMC Data Domain, Microsoft Azure, Amazon AWS, Dell EMC Cloud Storage Services, and Google Cloud. However, more generally, the scope of the present invention is not limited to cloud computing environments using any particular type or implementation.

[0031] In addition to a cloud environment, the operating environment may also include one or more clients capable of collecting, modifying, and creating data. Therefore, a particular client may be associated with one or more instances of each of one or more applications that perform such operations on the data. Such clients may include physical machines or virtual machines (VMs).

[0032] Specifically, devices in an operating environment (including cloud storage and / or cloud computing environments) can take the form of software, physical machines, or VMs, or any combination thereof, but no particular device implementation or configuration is required in any implementation. Similarly, data protection system components such as databases, storage servers, storage volumes (LUNs), storage disks, replication services, backup servers, restore servers, backup clients, and restore clients can also take the form of software, physical machines, or virtual machines (VMs), but no particular component implementation is required in any implementation. When using VMs, VMs can be created and controlled using hypervisors or other virtual machine monitors (VMMs). The term VM includes, but is not limited to, any virtualization, emulation, or other representation of one or more computing system elements (such as computing system hardware). VMs can be based on one or more computer architectures and provide the functionality of a physical computer. VM implementations can include hardware and / or software or at least involve the use of hardware and / or software. For example, a VM image can take the form of a .VMX file and one or more .VMDK files (VM hard disks).

[0033] As used herein, the term "data" is intended to be broad. Therefore, the term is used by way of example and not limitation to cover data segments, data chunks, data blocks, atomic data, emails, any type of object, any type of file (including media files, word processing files, spreadsheet files, and database files), as well as contacts, directories, subdirectories, volumes, and any group of one or more of the foregoing.

[0034] The exemplary embodiments of the present invention are applicable to any system capable of storing and processing various types of objects in analog, digital, or other forms. While terms such as document, file, segment, block, or object may be used by way of example, the principles of this disclosure are not limited to any particular form of representing and storing data or other information. Rather, these principles are equally applicable to any object capable of representing information.

[0035] Now pay special attention Figure 1 An example of the operating environment for an embodiment of the present invention is generally designated as 100. Generally, operating environment 100 may include any number of "n" clients 102, 104, and 106, each of which may include one or more applications that generate new / modified data. The data generated by clients 102, 104, and 106 may, for example, be backed up to a storage site (e.g., such as cloud storage site 300) by backup / restore server 200.

[0036] Cloud storage site 300 may include a data protection system 301, which may include a cleanup platform 302 whose operation can be triggered by the data protection system 301, a deduplication system 304 that can run locally at an enterprise data center or in the cloud storage site 300 as shown, and a storage device 306. While any of the various cleanup, defragmentation, and junk collection processes and functions disclosed herein can be implemented by the cleanup platform 302, the scope of the invention is not limited to this particular implementation. In some embodiments, the cleanup platform 302 may be an element of the deduplication system 304, but this is not necessary. Such processes and functions may be provided as a service, such as to an enterprise including clients 102, 104, and 106. These processes and functions may be initialized by, for example, one or more clients among cloud storage site 300, backup / restore server 200, and / or clients 102, 104, and 106. In some implementations, any of the cleanup, defragmentation, and garbage collection processes and functions disclosed herein can be automatically initialized by the deduplication system 304 and / or the data protection system 301, without requiring initialization by the client or any other entity. Such processes can be scheduled or executed on demand. Furthermore, in at least some implementations, the objects and segments processed by the cleanup platform 302 may have already been deduplicated, for example, by cloud storage site 300, clients 102 / 104 / 106, and / or backup / restore server 200, prior to processing by the cleanup platform 302.

[0037] Obviously, it is possible to Figure 1 The example operating environment 100 can be modified in various ways. For example, the deduplication system 304 can operate entirely as a cloud machine instance / VM within the cloud storage site 300. Thus, in this example configuration, both the data protection system 301 and the deduplication system 304 can operate within the cloud storage site 300. Alternatively, the data protection system 301 can run locally, i.e., locally at the enterprise data center. However, more generally, specific configurations or locations for the data protection system or the deduplication system are not required.

[0038] Regarding some operations of the example operating environment 100, the backup server 200 can transmit data from clients 102 / 104 / 106 to the data protection system 301, which will use a deduplication system 304 to deduplicate the data and then write the deduplicated objects to disk. Subsequently, the deduplicated objects can be moved to cloud storage for long-term retention. Furthermore, the operation of the cleanup platform 302 can be part of and triggered by the data protection system 301. In this case, the cloud storage site 300 can be solely responsible for the long-term storage of the deduplicated objects, such as long-term storage on storage device 306.

[0039] C. Further aspects of some example implementation methods

[0040] C.1. Some general aspects

[0041] As previously described, exemplary embodiments of the present invention include an efficient defragmentation / cleanup mechanism for expired retainable lock (RL) segments in environments such as cloud sites. This mechanism or method can selectively handle fragmented objects in the cloud, i.e., any object containing a mixture of expired RL segments and active RL segments, and defragment these objects by segmenting their segments into distinct objects based on various retainable lock-specific criteria. Objects containing a mixture of expired RL segments and active RL segments may be referred to herein as “fragmented objects.”

[0042] More specifically, and with particular reference to RLGs, one example approach could detect all fragmented RLG objects containing a certain number of expired segments, retrieve the active and expired segments of such objects, and then separate those segments into new container objects based on factors such as, but not limited to, the corresponding expiration durations and corresponding lock states of those segments. At least in some implementations, only new objects with active RLG segments will ultimately be locked for the longest duration using the cloud provider's API, i.e., based on the longest duration or retention value of any segments of the new object. Finally, the retention locks on all selected fragmented objects can be restored, and those selected fragmented objects can be deleted. Garbage collection (GC) processes or other cleanup processes may now be able to inspect newly created, unlocked objects with expired segments and potentially delete those objects to reclaim cloud space. This can be done after the garbage collection process has performed one or more periodic liveness checks to identify non-expired objects that should be retained.

[0043] Referring now to the RLC scenario, implementations of this invention can employ an alternative approach to handle compliance-locked objects (RLCs) because even cloud administrators cannot restore RLC locks on those objects. More specifically, an example method for handling RLC objects can employ a controlled locking mechanism. Specifically, an object can be locked for the shortest duration among all its segments; that is, the lock time can be set to the lowest lock time value for any segment constituting the object. In this way, compliance locks are likely to expire within the shortest possible duration, rather than within the longest duration as in the case of RLG objects, thus providing an early opportunity for cleanup.

[0044] During GC execution, a similar approach based on lock status and expiration duration can be used, as in implementations involving RLG objects. One difference between the RLC and RLG approaches is that in the RLC approach, newly created objects containing grouped active segments are locked for the shortest duration, rather than the longest duration as in the RLG approach. Regardless of whether the RLC or RLG procedure is used in a particular case or situation, after processing objects and / or segments, indexes (e.g., FP (fingerprint) indexes) and local segment and / or object metadata can be updated to point to any new container objects created by the procedure.

[0045] Next, turn to Figure 2 Some examples of fragmented cloud objects with expired segments are generally disclosed at point 400. While the example fragmented cloud object 400 is disclosed as including a relatively small number of segments, a fragmented cloud object can contain, for example, 50-100 segments or more. More generally, a fragmented cloud object can contain any number of segments, and the scope of the invention is not limited to the illustrative examples disclosed in the figures. As shown, fragmented cloud objects 400 can have different sizes, with some fragmented cloud objects 400 having more or fewer segments than others. Furthermore, each fragmented cloud object 400 can include a mixture of valid (i.e., not expired) segments and expired segments. Similarly, the individual segments in each fragmented cloud object 400 can have different corresponding durations, such as 6 months, 9 months, or 1.5 years. Finally, the fragmented cloud object 400 can be locked for different corresponding time periods, such as 1 year, 6 months, or 2 years. The locking employed can be, for example, RLG or RLC. The following is a discussion of some example implementations that include methods for processing fragmented cloud objects (such as... Figure 2 The example of fragmented cloud objects (400) is disclosed in the documentation.

[0046] C.2. Aspects of Some Example Methods

[0047] Generally, objects can be protected using various retention modes. These retention modes include compliance and control modes, typically applying different corresponding levels of protection to the object. These modes, or lockouts, may be referred to herein as Retention Lockout-Control (RLG) mode and Retention Lockout-Compliance (RLC) mode, respectively. Any of the methods disclosed herein can be performed on deduplicated blocks, segments, and / or other portions of the data. However, it is not necessary to perform any specific method on the deduplicated data. In some implementations, data can be deduplicated by a deduplication application / system, and then the deduplicated data can be processed as disclosed herein. Such deduplication and processing can be performed by the same computing entity or different corresponding computing entities.

[0048] When an object is protected in Controlled Logic Mode (RLG), unless a user has special permissions, they may not be able to overwrite or delete the object, or change its locking settings. Therefore, while a user with special permissions might be able to restore the RLG lock on an object, this could prevent most users from deleting or modifying an RLG-protected object. After the RLG lock is restored, the object is no longer protected and may be deleted, overwritten, or modified.

[0049] When an object is protected in Compliance Mode (RLC), the lock on the RLC-protected object cannot be removed, meaning the object may not be overwritten or deleted by any user (even an administrator). When an object is locked in Compliance Mode, its retention mode may not be changed, and the retention period for the RLC-protected object may not be shortened. Using Compliance Mode to protect an object ensures that it cannot be overwritten or deleted for the duration of the retention period.

[0050] Now for reference Figure 3 and Figure 3A This document provides details of an example method 500 for using cloud objects to process fragmented RLG locks to create new defragmented and partitioned objects 550 for RLGs. Method 500 may begin at 502, where one or more RLG lock objects are identified as potential candidates for defragmentation. Each of the candidate RLG lock objects can then be examined to determine 504 whether they contain a specific number or other amount of expired segments. In some implementations, configurable parameters can be used to define a threshold that can be used to determine 504 whether an object will be defragmented. For illustration, setting the threshold to 30% would mean that at least 30% of the segments in an object must expire before defragmentation of that object can be considered. The threshold can be set at any suitable level. Generally, setting the threshold high enough can be used to ensure that defragmenting objects that meet the threshold would provide sufficient benefits, such as the use of processing resources required for defragmentation and the potential amount of reclaimable space that may result from defragmentation, to justify defragmenting those objects.

[0051] If it is determined that the 504 object does not meet the expiration segment threshold, method 500 can stop at 506 or return to 502. On the other hand, if it is determined that the 504 object meets or exceeds the expiration segment threshold, method 500 can proceed to 508, where the object can be processed to separate its RL expiration segment and RL active or valid segment.

[0052] After an object's segments have been separated into expired and valid segments (508), new objects (510) can be created using the expired and valid segments. For example, and as... Figure 3AAs shown, one or more new objects 552 can be created (510), which can be packaged container objects containing only valid segments, such as RL active segments. One or more new objects 554 can also be created (510), which contain only expired segments, such as RL expired segments. Because object 554 can contain only expired segments, object 554 does not need to be locked. Therefore, segment separation (508) can be based at least in part on which segments are locked or will be unlocked. As explained in further detail below, new object 552 can be locked. After creating new objects 552 and / or 554 (510), the original object processed at (508) (512) can be deleted.

[0053] Referring further to separation process 508, this separation can be based not only on the locking status of each segment, but also on the corresponding expiration time of the segment. For example, and as... Figure 3A As shown, file segments expiring within, for example, a few days, a week, a month, two months, six months, or a year can be grouped together in the same corresponding object. Therefore, object 552 can be referred to as a zone or time range corresponding to the expiration times of all segments that may be included in object 552. Thus, for example, in the case where object 552 covers a zone of 0-6 months, all segments in object 552 have been identified as expected to expire within 6 months or less, as measured from a reference time point.

[0054] In some implementations, the reference time point may be the time when object 552 is created, but the reference time point may be a time after or before the time when object 552 is created. In some implementations, all new objects 552 may have the same reference time point, but this is not required, and in other implementations, one or more of objects 552 may have different corresponding reference time points.

[0055] The size or duration of the zone associated with object 552 can vary depending on various circumstances. For example, if object 552 will be stored in a private cloud, the duration of the zone can be relatively short, such as days or weeks. As another example, if object 552 will be stored in a public cloud, the duration of the zone may be relatively long, such as months or years.

[0056] Continue to refer to Figure 2 , Figure 3 and Figure 3AExample separation process 508 may include reading segments from fragmented cloud objects 400 and accumulating those segments in corresponding memory buffers / areas, where buffers / areas may be specified to hold segments based on the expiration duration of active segments. In at least some embodiments, expired segments may not be distributed in areas. Some or all of, such as the longest expiration_date and lock_count seen, of the RL metadata used for accumulating active segments may be maintained throughout separation process 508. Once the memory buffers are full, they can be written as new objects 552, along with associated metadata (if any).

[0057] A new object 552 containing active (i.e., unexpired) segments can be locked by the RLG 514 using, for example, the cloud API, for a duration equal to the longest RL expiration time seen among all segments within it. For illustration, the segment with the longest lock time in object 552 is segment E2, which has a lock time of 5 months. Therefore, object 552 can be locked by the RLG for 5 months. As mentioned above, new objects 554 containing only expired segments may not require any further action after being created 510, and those objects 554 can now be selected by the regular GC process for validity checks and may be deleted so that the space occupied by those objects 554 can be reclaimed.

[0058] Next, turn to Figure 4A , Figure 4B and Figure 4C And continue to refer to Figure 2 It provides details of example method 600 for RLC locking of cloud objects, where such objects are... Figure 4C The total is denoted by 650. Unless otherwise stated herein, method 600 may be similar to or the same as method 500, and object 650 may be similar to or the same as object 550.

[0059] Special attention Figure 4A Method 600 can begin upon receiving a new compliance lock request (602), such as via a controlled locking mechanism for RLC objects. After receiving the lock request (602), it is determined (604) whether the object is already locked. If it is determined that the object (604) is not locked, method 600 can proceed to (606), where the object is locked for the shortest duration of all segments of the object. The object metadata and segment metadata (608) can then be updated.

[0060] On the other hand, if it is determined at 604 that the object involved in receiving the lock request at 602 is already locked, it can be further determined whether the new lock duration specified in the lock request at 605 is greater than or less than the current lock duration of the object. If it is determined that the new lock duration at 605 is less than the current lock duration, then method 600 can proceed to 607, where only the lock count of the object and its segments is updated.

[0061] If it is determined that the new lock duration at 605 is greater than the current lock duration of the object, the method can proceed to 609. At 609, the segment's metadata (such as lock count and expiration date) and the object's lock count are updated, but the object's expiration date is not updated.

[0062] Next reference Figure 4B Details are provided regarding an example method 675 for handling expired RLC objects. Method 675 may begin at 677, where one or more expired objects are identified. Next, a threshold determination 679 is performed, which may be similar to or the same as determination 504 in method 500. If the threshold is not met, the method may proceed to 681. If the threshold is met at 679, the method may proceed to 683, where the segments of the expired RLC objects are separated.

[0063] Separation 683 may include performing one or more garbage collection runs with any RLC expired objects. For each RLC expired object, method 675 may follow the same group-based separation mechanism, such as the separation mechanism in example method 500 for RLG objects. That is, in the separation process 683, the unexpired segments of the object can be read and accumulated in different memory buffers, where each buffer is designated to hold segments for a specific expiration duration, such as 6 months, 1 year, 1.5 years, 2 years. Any expired segments can be placed into another buffer.

[0064] like Figure 4CAs shown, a new container object 685 can then be written to the buffer in memory, an example of which is shown at 652, such that each object can contain segments with associated expiration durations. For example, one of the segments in object 652 has a range of 0-6 months, meaning that each segment in object 652 will expire no more than 6 months after a specific reference time. In this particular example, the segment with the longest duration (segment E2) can be seen to be set to expire in 5 months. All such objects can then be locked by RLC 687, for example, using the cloud provider's API, based on the shortest expiration duration seen among all the segments constituting the object. For illustration and again referring to object 652 with a range of 0-6 months, the shortest duration of any segment in object 652 is 2 months, specified by segments E8 and E10. Therefore, the lock time for object 652 can be set to 2 months. Process 685 can also include creating a new object 654 whose segments have all expired. The new object 654 may not be locked.

[0065] The new object 652 (such as object 652 whose region is, for example, 6-12 months) can contain individual segments with different corresponding durations. In this particular example, segments A1 and B4 (see also...) Figure 2 The shortest duration of all segments in the new object 652 is used. Therefore, the RLC can be maintained on the object 652 for the shortest duration, which is 6 months.

[0066] Regarding segment expiration, suppose an object (e.g., a new object 652) contains segments with durations of 6 months, 1 year, and 2 years, and the RLC will be held for the minimum duration specified by these segments, i.e., 6 months. At some point before this object expires, such as the day before the end of the 6-month period, the remaining segments with durations of 1 year and 2 years can be replicated and forwarded 689 and locked for 1 year, i.e., the minimum duration specified by the two remaining segments. This process allows the old object (i.e., the object containing segments with durations of 6 months, 1 year, and 2 years) to be deleted 691, thereby saving space and reducing TCO. Without the replication and forwarding process 689, even if an object has only one segment with a duration of 2 years, the object will be locked for 2 years.

[0067] Continue to refer to Figure 4A , Figure 4B and Figure 4C Any new objects 654 containing expired segments may not require any further processing after their creation 685, and those objects 654 can now be selected by the regular GC process for validity checks and may be deleted so that the space occupied by those objects 654 can be reclaimed. Similarly, original fragmented objects (such as one of objects 400, which is the basis for the creation 685 of one or more objects 652 and / or 654) can be deleted 691 and their space may be reclaimed immediately.

[0068] It should be noted that while method 675 can be specifically applied to RLC objects, the same or similar methods can also be applied to RLG objects, for example, through a deduplication system. In some implementations, methods 500, 600, and 675 (which include separation processes 508 and 683, respectively) can maintain their own state data to ensure that these methods and processes can be paused, continued, or resumed in the event of sudden termination or other conditions.

[0069] In some implementations, methods 500, 600, 675 and / or any part thereof can be performed without actually creating any new objects. That is, at least processes 510 and 685 can be omitted. These modified methods may be referred to as running in analysis mode and / or defining analysis mode, and can be executed to prospectively analyze potential space reclamation benefits, and / or estimates of total I / O (input / output operations), and to create new objects for 510 / 685, which input / output operations can be performed during the actual runtime of the full methods 500 and / or 600.

[0070] In analysis mode, compared to methods 500, 600, and 675, there is no need to create buffers in memory, perform segment copies, create real objects, or perform any I / O. Instead, in one implementation, analysis mode can simply maintain proper descriptions of all phases of the method and can provide a report after the method completes.

[0071] Regarding the example methods disclosed herein (including methods 500, 600, and 675), it should be noted that any of the disclosed processes, operations, methods, and / or any part thereof may be executed in response to, as a result of, and / or based on the execution of any preceding one or more processes, methods, and / or operations. Accordingly, the execution of one or more processes may, for example, be a prerequisite or trigger for the subsequent execution of one or more other processes, operations, and / or methods. Thus, for example, the various processes that may constitute a method may be linked together or otherwise associated with each other through relationships such as those just mentioned in the examples.

[0072] Next reference Figure 5 It was previously noted that object metadata and / or segment metadata (such as object metadata 700) can be updated by combining the execution of various methods and procedures disclosed herein. Figure 5As illustrated in the example, object metadata 700 may include metadata applicable to the object as a whole, such as, but not limited to, lock_mode, expiry_date, and lock_count. Similarly, individual segments of the object may have corresponding associated segment-specific metadata. For example, for each of one or more segments, segment metadata 702 may include, for example, segment expiry_date and segment lock_count. Any segment metadata and object metadata can be updated by combining the execution of the methods and processes disclosed herein. For example, the expiry date can be updated, as noted herein. As another example, if a file is deleted or added, the lock count of the object and / or segment can be updated, respectively, by decrementing or incrementing, to reflect that the object or segment is no longer shared by that file or is shared by a new file.

[0073] D. Further aspects of some example implementation methods

[0074] As disclosed herein, embodiments of the invention particularly include defragmentation processes that can be performed on objects and their constituent segments. Some such processes and / or other processes disclosed herein can have various characteristics. For example, exemplary embodiments can operate on objects with variable sizes, such as blocks with variable sizes. As another example, some embodiments can reduce cloud storage costs by moving segments with valid and expired retention periods to different corresponding objects, so that the entire expired object can expire immediately and the space occupied by the expired object can be reclaimed. This aspect can be particularly useful when applied to existing files where users (such as application administrators) can arbitrarily change the retention periods of objects and / or segments. Similarly, in some embodiments, files or other objects can be locked at any time after their creation, not necessarily at creation time. Therefore, objects and / or segments may not be grouped according to the retention duration values ​​present at creation. Furthermore, exemplary embodiments can perform segment-by-segment analysis and processing, such as retention or deletion, on a segment-by-segment basis. In contrast, conventional systems and methods cannot delete only selected portions of an object, such as expired segments. Also, in operating environments such as cloud environments, embodiments can operate in combination of two locking modes (i.e., control mode and compliance mode). Furthermore, implementations can achieve significant benefits by reducing TCO in the form of locking objects for a minimum amount of time while RLC objects are locked, then defragmenting the objects before they expire and reapplying the retention to newly generated objects based on the next minimum retention time of the segments in the objects. Moreover, while conventional methods and deduplication application providers may offer some limited object retention capabilities on the local end, such methods and deduplication applications cannot provide retention capabilities on the cloud provider end, let alone any retention capabilities disclosed herein. Therefore, in these conventional methods, data is easily deleted from the cloud provider end.

[0075] Finally, the following are examples of various specific implementations. In one example of such an implementation, a method can detect fragmented cloud objects and separate those objects' segments into individual defragmented objects based on various factors such as segment lock status, expiration duration, and lock mode, ultimately helping the GC to reclaim cloud space used by expired segments. In another such example implementation, a method can implement controlled RLC (compliance) locking on cloud objects, causing compliance locks to expire early, thereby allowing defragmentation methods to process and separate objects containing expired segments. In yet another such example implementation, a method that can be configured to focus on, for example, TCO reduction can achieve separation of RL active segments and RL expired segments while ensuring data integrity remains intact at the segment level. In a final such example implementation, a method can analyze the level of fragmentation attributable to expired segments in an environment (such as a cloud environment) and provide an estimated report of the potential benefits and possible I / O costs of the method in actual operation.

[0076] E. Other exemplary embodiments

[0077] The following are some other exemplary embodiments of the present invention. These are presented by way of example only and are not intended to limit the scope of the invention in any way.

[0078] Implementation 1. A method comprising: identifying a cloud object as a potential candidate for defragmentation; evaluating the cloud object to determine which portion of the cloud object's segments has expired; separating the expired and unexpired segments of the cloud object when the expired segments of that portion meet or exceed a threshold; creating a first new cloud object comprising only the unexpired segments; creating a second new cloud object comprising only the expired segments; and deleting the cloud object from a storage device.

[0079] Implementation Method 2. As described in Implementation Method 1, wherein the cloud object is a deduplicated object.

[0080] Implementation method 3. The method as described in any one of implementation methods 1 to 2, wherein the first new cloud object is locked and the second new cloud object is not locked.

[0081] Implementation 4. The method of any one of Implementations 1 to 3, wherein the first new cloud object is partitioned such that all segments of the first new cloud object have corresponding expiration times falling within a specified time range.

[0082] Implementation 5. The method of any one of Implementations 1 to 4, wherein the first new cloud object comprises a plurality of segments, each of the plurality of segments having a corresponding expiration duration, and the first new cloud object is RLG locked to a time period corresponding to the longest expiration duration among the plurality of segments.

[0083] Implementation 6. The method of any one of Implementations 1 to 5, wherein the first new cloud object comprises a plurality of segments, each of the plurality of segments having a corresponding expiration duration, and the first new cloud object is RLC locked to a time period corresponding to the shortest expiration duration among the plurality of segments.

[0084] Implementation 7. The method as described in Implementation 6, wherein the first new cloud object expires at the end of a time period corresponding to the shortest expiration duration, and before the first new cloud object expires, the method further includes: copying and forwarding all segments whose corresponding expiration durations are longer than the shortest expiration duration; using the copied and forwarded segments to create a third new cloud object; and deleting the first new cloud object from the storage device.

[0085] Implementation method 8. The method as described in any one of implementation methods 1 to 7, further comprising performing a validity check and deleting a second new cloud object based on the result of the validity check.

[0086] Implementation 9. The method of any one of Implementations 1 to 8, wherein separating expired segments and non-expired segments includes copying the non-expired segments to a buffer in a first memory and copying the expired segments to a buffer in a second memory.

[0087] Implementation 10. The method as described in any one of Implementations 1 to 9, further comprising: receiving a lock request for a cloud object; determining whether the cloud object is locked or unlocked; when it is determined that the cloud object is not locked, locking the cloud object based on the duration of the segments of the cloud object having the shortest duration of all segments of the cloud object, and updating the metadata of the cloud object and the metadata of the segments; and when it is determined that the cloud object is locked, determining whether the new lock duration specified in the lock request is greater than or less than the current lock duration of the cloud object, and when the new lock duration is greater than the current lock duration, updating the metadata of the segments (lock count and expiration date) and updating the lock count of the cloud object, but not updating the expiration date of the cloud object; or when the new lock duration is less than the current lock duration, only updating the lock count of the cloud object and the lock count of the segments.

[0088] Implementation 11. A method comprising: receiving a lock request for a cloud object; determining whether the cloud object is locked or unlocked; when it is determined that the cloud object is unlocked, locking the cloud object based on the duration of a segment of the cloud object having the shortest duration of all segments of the cloud object, and updating the metadata of the cloud object and the metadata of the segments; and when it is determined that the cloud object is locked, determining whether a new lock duration specified in the lock request is greater than or less than the current lock duration of the cloud object, and when the new lock duration is greater than the current lock duration, updating the metadata of the segments (lock count and expiration date) and updating the lock count of the cloud object, but not updating the expiration date of the cloud object; or when the new lock duration is less than the current lock duration, only updating the lock count of the cloud object and the lock count of the segments.

[0089] Implementation 12. A method for performing any of the operations, methods, or processes disclosed herein, or any part thereof.

[0090] Embodiment 13. A non-transitory storage medium storing instructions that can be executed by one or more hardware processors to perform operations including one or more of the operations described in Embodiments 1 to 12.

[0091] F. Example computing device and associated media

[0092] The embodiments disclosed herein may include the use of a dedicated or general-purpose computer, which includes various computer hardware or software modules, as discussed in more detail below. The computer may include a processor and a computer storage medium carrying instructions that, when executed by and / or caused to be executed by the processor, perform any one or more of the methods disclosed herein, or any one or more portions of any of the disclosed methods.

[0093] As indicated above, embodiments within the scope of this invention also include computer storage media, which are physical media for carrying or storing computer-executable instructions or data structures. Such computer storage media can be any available physical media accessible by a general-purpose or special-purpose computer.

[0094] By way of example, and not limitation, such computer storage media may include hardware storage devices such as solid-state drives (SSDs), RAM, ROM, EEPROM, CD-ROM, flash memory, phase-change memory (“PCM”), or other optical disk storage devices, magnetic disk storage devices, or other magnetic storage devices, or any other hardware storage device that can be used to store program code in the form of computer-executable instructions or data structures, accessible and executed by general-purpose or special-purpose computer systems to implement the functions disclosed in this invention. Combinations of the above should also be included within the scope of computer storage media. Such media are also examples of non-transitory storage media, and non-transitory storage media also include cloud-based storage systems and architectures, but the scope of this invention is not limited to these examples of non-transitory storage media.

[0095] Computer-executable instructions include, for example, instructions and data that, when executed, cause a general-purpose computer, a special-purpose computer, or a special-purpose processing device to perform a specific function or group of functions. Therefore, some embodiments of the present invention can be downloaded, for example, from a website, mesh topology, or other source to one or more systems or devices. Similarly, the scope of the present invention includes any hardware system or device containing an application instance that includes the disclosed executable instructions.

[0096] Although the subject matter has been described in language specific to structural features and / or methodological actions, it should be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or actions described above. Rather, the specific features and actions disclosed herein are disclosed as exemplary forms for implementing the claims.

[0097] As used herein, the terms "module" or "component" can refer to a software object or routine that executes on a computing system. The various components, modules, engines, and services described herein can be implemented as objects or processes that execute on a computing system, for example, as individual threads. While the systems and methods described herein can be implemented in software, implementation in hardware or a combination of software and hardware is also possible and contemplated. In this disclosure, a "computing entity" can be any computing system as previously defined herein, or any module or combination of modules running on a computing system.

[0098] In at least some cases, a hardware processor is provided that is operable to execute executable instructions for performing methods or processes, such as those disclosed herein. The hardware processor may or may not include other hardware elements, such as the computing devices and systems disclosed herein.

[0099] Regarding the computing environment, embodiments of the present invention can be executed in a client-server environment, whether in a network environment or a local environment, or in any other suitable environment. Suitable operating environments for at least some embodiments of the present invention include cloud computing environments, wherein one or more of the client, server, or other machines can reside in and operate within the cloud environment.

[0100] Now simply refer to Figure 6 ,Depend on Figures 1 to 5 Any or more of the entities disclosed or implied elsewhere herein may take the form of, include, be implemented on, or be hosted by a physical computing device, such as 800. Similarly, where any of the foregoing elements includes or constitutes a virtual machine (VM), the VM may constitute... Figure 6 Virtualization of any combination of physical components disclosed in the document.

[0101] exist Figure 6 In the example, physical computing device 800 includes: memory 802, which may include one, some, or all of random access memory (RAM), non-volatile memory (NVM) 804 (e.g., NVRAM), read-only memory (ROM), and persistent memory; one or more hardware processors 806; non-transitory storage medium 808; a user interface (UI) device 810; and a data storage device 812. One or more of the memory components 802 of physical computing device 800 may be in the form of solid-state drive (SSD) memory. Similarly, one or more application programs 814 may be provided, which include instructions executable by one or more hardware processors 806 to perform any or part of the operations disclosed herein.

[0102] Such executable instructions may take various forms, including, for example, instructions executable to perform any method or part thereof disclosed herein, and / or instructions executable at any of / at any of a storage site (whether enterprise-on-premises or cloud computing site), a client, a data center, a data protection site (including cloud storage sites), or a backup server to perform any function disclosed herein. Similarly, such instructions may be executable to perform any other operations and methods disclosed herein, and any part thereof.

[0103] This invention may be embodied in other specific forms without departing from its spirit or essential characteristics. The described embodiments are to be considered illustrative rather than restrictive in all respects. Therefore, the scope of the invention is indicated by the appended claims rather than by the foregoing description. All variations within the meaning and scope of equivalents of the claims are to be covered within the scope of the claims.

Claims

1. A method comprising: Identify cloud objects as potential candidates for defragmentation in cloud storage; Evaluate the cloud object to determine which part of the cloud object's segments has expired; When the expired segment of the aforementioned portion meets or exceeds the threshold: Separate the expired and non-expired segments of the cloud object; Based on the expiration time of the unexpired segments, create multiple first new cloud objects in the cloud storage that consist only of the unexpired segments; Create a second new cloud object in the cloud storage that includes only expired segments that have not yet been deleted; as well as Delete the cloud object from the storage device. Each of the plurality of first new cloud objects is locked for a corresponding retention duration, the corresponding retention duration being based on the expiration duration of the segments included in each first new cloud object.

2. The method of claim 1, wherein the cloud object is a deduplicated object.

3. The method of claim 1, wherein the plurality of first new cloud objects are locked and the second new cloud objects are not locked.

4. The method of claim 1, wherein the plurality of first new cloud objects are partitioned such that all segments of the plurality of first new cloud objects have corresponding expiration times that have not yet arrived but fall within a specified time range.

5. The method of claim 1, wherein the corresponding retention duration is the longest expiration duration among a plurality of segments stored in the corresponding first new cloud object.

6. The method of claim 1, wherein the respective retention duration is the shortest expiration duration among a plurality of segments stored in the respective first new cloud object.

7. The method of claim 6, wherein the plurality of first new cloud objects will expire at the end of the not-yet-arrived shortest expiration duration, and the method further comprises, before one of the plurality of first new cloud objects expires: Copy and forward all segments whose corresponding expiration duration is longer than the shortest expiration duration; Use the copied and forwarded segment to create a third new cloud object; as well as Delete the plurality of first new cloud objects from the storage device.

8. The method of claim 1, further comprising performing a validity check and deleting the second new cloud object based on the result of the validity check.

9. The method of claim 1, wherein separating the expired segment and the non-expired segment comprises: The unexpired segments are copied to a buffer in the first memory, and the expired segments are copied to a buffer in the second memory.

10. The method of claim 1, further comprising: Receive a lock request for the cloud object; Determine whether the cloud object is locked or unlocked; When it is determined that the cloud object is not locked, the cloud object is locked for a lock duration equal to the minimum retention duration of all segments of the cloud object, where each segment has a corresponding retention duration. And update the object metadata of the cloud object and the segment metadata of each segment in the segment, the metadata including at least the lock count and the expiration date; as well as When it is determined that the cloud object is locked, the new lock duration specified in the lock request is compared with the current lock duration of the cloud object, and When the new lock duration is greater than the current lock duration, update the segment metadata of the segment, including updating the segment lock count and segment expiration date based on the new lock duration; update the object lock count of the cloud object; and keep the existing object expiration date of the cloud object unchanged; or When the new lock duration is less than the current lock duration, only the object lock count of the cloud object and the segment lock count of the segment are updated, but no expiration date is modified.

11. A non-transitory storage medium storing instructions that can be executed by one or more hardware processors to perform the method as described in any one of claims 1-10.

Citation Information

Patent Citations

  • Marking impacted similarity groups in garbage collection operations in deduplicated storage systems

    US20200310964A1