Method, apparatus and system for processing snapshots
By obtaining a snapshot of the directory tree and deleting metadata that does not belong to the directory subtree, the creation of directory subtree snapshots is simplified, solving problems that are difficult to handle in existing technologies. This simplifies the creation of directory subtree snapshots, solves the problems of wasted storage space and degraded cloud service performance, and achieves high-efficiency cloud service performance.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- ALIBABA (CHINA) CO LTD
- Filing Date
- 2023-03-31
- Publication Date
- 2026-06-19
AI Technical Summary
Existing technologies struggle to process snapshots of directory subtrees easily and quickly, leading to wasted storage space and decreased cloud service performance.
By receiving a snapshot creation request for a directory subtree, a snapshot of the directory tree containing the directory subtree is obtained, and it is determined whether the deleted directory and/or file belongs to the directory subtree. If it does not belong, the corresponding metadata is deleted from the snapshot, and garbage data is reclaimed.
It simplifies the creation of snapshots for directory subtrees, reduces storage space waste, and improves cloud service performance.
Smart Images

Figure CN116401207B_ABST
Abstract
Description
Technical Field
[0001] The embodiments in this specification relate to the field of cloud computing technology, and in particular to methods for processing snapshots. Background Technology
[0002] Cloud services are services that provide computing and storage resources via the internet in an on-demand and easily scalable manner. Examples include cloud storage and cloud photo albums. In cloud storage, files are managed based on metadata. With the development of cloud services, the scale of metadata in cloud storage is growing rapidly, sometimes reaching hundreds of billions. Currently, distributed key-value pair storage is commonly used to manage metadata, with the metadata of each cloud drive forming a directory tree. Based on the snapshot capability of key-value pair storage, snapshots of the directory tree can be easily taken, resulting in a snapshot of the directory tree.
[0003] However, in many application scenarios, snapshots of directory subtrees are required. Therefore, how to handle snapshots to obtain snapshots of directory subtrees simply and quickly, while avoiding space waste and improving cloud service performance, is an urgent problem to be solved. Summary of the Invention
[0004] In view of this, embodiments of this specification provide a method for processing snapshots. One or more embodiments of this specification also relate to apparatus, computing devices, computer-readable storage media, and computer programs for processing snapshots, in order to address technical deficiencies in the prior art.
[0005] According to a first aspect of the embodiments of this specification, a method for processing snapshots is provided, comprising: determining a snapshot of a directory subtree, the snapshot of the directory subtree being generated based on a snapshot of the directory tree in which the directory subtree resides; determining a deleted directory and / or file, the metadata corresponding to the deleted directory and / or file belonging to the directory tree; determining whether the metadata corresponding to the deleted directory and / or file belongs to the directory subtree; and if it is determined that it does not belong, deleting the metadata corresponding to the deleted directory and / or file from the snapshot of the directory subtree.
[0006] According to a second aspect of the embodiments of this specification, a method for processing snapshots is provided, comprising: receiving a snapshot creation request for a directory subtree; obtaining a snapshot of the directory tree in which the directory subtree is located according to the snapshot creation request; and using the snapshot of the directory tree as a snapshot of the directory subtree.
[0007] According to a third aspect of the embodiments of this specification, a system for processing snapshots is provided, comprising: a garbage collection triggering server configured to distribute a plurality of garbage collection processing tasks to a plurality of garbage collection processing servers when it is determined that a plurality of deleted directories and / or files exist, the plurality of garbage collection processing tasks representing garbage collection of the plurality of deleted directories and / or files, and different garbage collection processing tasks corresponding to different directories and / or files; and a garbage collection processing server configured to, in response to receiving the garbage collection processing tasks, perform garbage collection on a snapshot of the directory subtree according to a snapshot processing method as described in any embodiment of this specification.
[0008] According to a fourth aspect of the embodiments of this specification, a computing device is provided, comprising: a memory and a processor; the memory is configured to store computer-executable instructions, and the processor is configured to execute the computer-executable instructions, wherein the computer-executable instructions, when executed by the processor, implement the steps of the above-described method for processing snapshots.
[0009] According to a fifth aspect of the embodiments of this specification, a computer-readable storage medium is provided that stores computer-executable instructions, which, when executed by a processor, implement the steps of the above-described method for processing snapshots.
[0010] According to a sixth aspect of the embodiments of this specification, a computer program is provided, wherein when the computer program is executed in a computer, it causes the computer to perform the steps of the above-described method for processing snapshots.
[0011] One embodiment of this specification implements a method for processing snapshots. Since the snapshot of the directory subtree obtained by this method is generated based on the snapshot of the directory tree in which the directory subtree is located, in order to avoid the problem of garbage data in the snapshot, after determining the deleted directory and / or file, it is determined whether the metadata corresponding to the deleted directory and / or file belongs to the directory subtree. If it is determined that it does not belong, the metadata corresponding to the deleted directory and / or file is deleted from the snapshot of the directory subtree. Therefore, the metadata of the deleted directory and / or file that does not belong to the snapshot of the directory subtree can be stripped from the snapshot, garbage data is recycled, storage space waste is reduced, metadata amplification is avoided, and cloud service performance is effectively improved.
[0012] One embodiment of this specification implements another method for processing snapshots. Since this method receives a snapshot creation request for a directory subtree, it obtains a snapshot of the directory tree in which the directory subtree is located according to the snapshot creation request, and uses the snapshot of the directory tree as the snapshot of the directory subtree, thereby simplifying the creation of snapshots for the directory subtree and effectively improving cloud service performance. Attached Figure Description
[0013] Figure 1 This is a schematic diagram illustrating a snapshot processing method provided in one embodiment of this specification in a cloud storage application scenario;
[0014] Figure 2 This is a flowchart illustrating a method for processing snapshots according to one embodiment of this specification;
[0015] Figure 3 This is a schematic diagram of a directory tree provided in one embodiment of this specification;
[0016] Figure 4 This is a schematic diagram of the structure of an apparatus for processing snapshots provided in one embodiment of this specification;
[0017] Figure 5 This is a flowchart illustrating a method for processing snapshots according to another embodiment of this specification;
[0018] Figure 6 This is a flowchart illustrating the process of a snapshot processing method provided in one embodiment of this specification.
[0019] Figure 7 This is a schematic diagram of a snapshot processing apparatus provided in another embodiment of this specification;
[0020] Figure 8 This is a schematic diagram of the structure of a snapshot processing system provided in one embodiment of this specification;
[0021] Figure 9 This is a structural block diagram of a computing device provided in one embodiment of this specification. Detailed Implementation
[0022] Many specific details are set forth in the following description to provide a full understanding of this specification. However, this specification can be implemented in many other ways than those described herein, and those skilled in the art can make similar extensions without departing from the spirit of this specification. Therefore, this specification is not limited to the specific implementations disclosed below.
[0023] The terminology used in one or more embodiments of this specification is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of this specification. The singular forms “a,” “described,” and “the” as used in one or more embodiments of this specification and the appended claims are also intended to include the plural forms unless the context clearly indicates otherwise. It should also be understood that the term “and / or” as used in one or more embodiments of this specification refers to and includes any or all possible combinations of one or more associated listed items.
[0024] It should be understood that although the terms first, second, etc., may be used to describe various information in one or more embodiments of this specification, such information should not be limited to these terms. These terms are only used to distinguish information of the same type from one another. For example, first may also be referred to as second without departing from the scope of one or more embodiments of this specification, and similarly, second may also be referred to as first. Depending on the context, the word "if" as used herein may be interpreted as "when," "when," or "in response to a determination."
[0025] First, the terms and concepts used in one or more embodiments of this specification will be explained.
[0026] Cloud storage provides users with information storage, retrieval, and download services via the internet, featuring massive storage capacity.
[0027] Cloud photo albums provide users with photo storage, retrieval, and download services via the internet. They also offer features such as automatic uploading, automatic synchronization, and easy sharing, and boast massive storage capacity.
[0028] A directory tree is a tree-like management structure that starts from the root directory and branches out layer by layer according to the hierarchical relationships between directories and between directories and files. Intermediate nodes in the directory tree can represent directories, and leaf nodes can represent files or empty directories. The leaf nodes representing files store the file's metadata. File metadata is stored and managed using key-value pairs. Distributed key-value pair storage management is achieved by partitioning key-value pairs (partitioning means dividing key-value pairs into several sub-ranges according to the range of key values), thus forming a directory tree based on the hierarchical relationships between directories and between directories and files on a cloud drive.
[0029] The file's metadata contains the mapping between the file's logical address in the logical space and its actual physical storage location.
[0030] A snapshot is a data backup at a specific point in time, which can be used for data recovery in case of failure, providing a guarantee of data security. Even if a user accidentally deletes a file, they can quickly restore the data by rolling back from a snapshot.
[0031] A directory subtree is a branch within a directory tree. It is represented by a tree-like management structure where any directory node in the directory tree serves as the root node, and branches expand from that node. The path of a directory subtree within the directory tree is the path of its root node within the directory tree.
[0032] A key-value pair is a basic data structure that represents a key that corresponds to a value.
[0033] Garbage collection (GC) is a mechanism used to periodically reclaim space occupied by objects that are no longer referenced during idle time, thereby freeing up space occupied by garbage data.
[0034] Copy-on-write is an optimization strategy in the field of computer programming. Its core idea is that if multiple callers request the same resource at the same time, they will all obtain the same pointer to the same resource. Only when a caller attempts to modify the content of the resource will a dedicated copy be made for that caller. The original resource seen by other callers remains unchanged. This process is transparent to other callers.
[0035] It should be noted that the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data used for analysis, data stored, data displayed, etc.) involved in this manual are all information and data authorized by the user or fully authorized by all parties. Furthermore, the collection, use and processing of related data must comply with the relevant laws, regulations and standards of the relevant countries and regions, and corresponding operation portals are provided for users to choose to authorize or refuse.
[0036] With the development of cloud services, the scale of metadata in cloud storage is growing rapidly, sometimes reaching hundreds of billions. Distributed key-value pair storage is typically used to manage metadata, with the metadata of files on each cloud disk forming a directory tree. While distributed key-value pair storage allows for easy snapshotting of the directory tree of a single cloud disk, it cannot quickly obtain snapshots of directory subtrees.
[0037] On the other hand, if the method provided in the embodiments of this specification directly uses the snapshot of the directory tree as the snapshot of the directory subtree, although it achieves the goal of obtaining the directory subtree simply and quickly, according to the current snapshot processing scheme, there is a problem of garbage data occupying space, leading to wasted space. This is because in the traditional snapshot processing scheme, in order to ensure the snapshot's immutability, a copy-on-write method is needed to ensure the integrity of the snapshot during garbage collection. That is, when deleting a file, it checks whether a snapshot exists in the directory subtree. If it does, a new copy of the directory subtree snapshot is copied, the corresponding file is deleted on the new copy, and then this new copy is written to the snapshot system to ensure the original snapshot remains intact. The copy-on-write, query, and judgment processes consume a lot of extra resources, affecting performance. It can be seen that according to the traditional scheme, when a file that does not belong to the directory subtree is deleted, the garbage data corresponding to the file in the snapshot still occupies storage space, which leads to a relatively large metadata amplification, a low cache hit rate, and affects service performance.
[0038] In view of this, one aspect of this specification provides a method for processing snapshots. This method, upon receiving a snapshot creation request for a directory subtree, obtains a snapshot of the directory tree containing the directory subtree based on the snapshot creation request, and uses this snapshot as the snapshot of the directory tree, thereby simplifying the creation of directory subtree snapshots and effectively improving cloud service performance. Another aspect provides another method for processing snapshots. To avoid the problem of junk data in snapshots, after identifying the deleted directories and / or files, this method determines whether the metadata corresponding to the deleted directories and / or files belongs to the directory subtree. If it is determined that they do not belong, the metadata corresponding to the deleted directories and / or files is deleted from the snapshot of the directory subtree. Since data not belonging to the directory subtree is directly deleted from the snapshot, it does not actually affect the integrity of the directory subtree's content. Therefore, the metadata corresponding to deleted directories and / or files that do not belong to the directory subtree can be directly stripped from the snapshot, reclaiming junk data, reducing storage space waste, avoiding metadata amplification, and effectively improving cloud service performance.
[0039] In addition, this specification also relates to apparatus for processing snapshots, computing devices, and computer-readable storage media, which are described in detail in the following embodiments.
[0040] See Figure 1 , Figure 1 This diagram illustrates a snapshot processing method according to an embodiment of this specification in a cloud storage application scenario. Figure 1As shown, the cloud includes snapshot servers, computing clusters, and storage clusters. The computing cluster includes one or more cloud servers. The storage cluster includes one or more cloud disks. These cloud disks can be used to implement storage such as cloud photo albums. Based on the cloud storage services provided by the cloud, the user can send a request to the cloud to access data. One or more cloud servers in the cloud's computing cluster access the cloud disk to obtain the corresponding file according to the request and return the file to the user. A cloud disk is a block-level virtual storage device provided to the cloud. The underlying layer of the cloud disk can be a physical block storage device such as a hard disk. Like a hard disk, users can partition, format, create file systems, and persistently store data on the cloud disk mounted on the cloud server. The snapshot server can be used to create snapshots of the directory subtree of the cloud disk and / or perform garbage collection on the snapshots of the directory subtree. The metadata on a cloud disk, based on distributed key-value pair storage management, forms an independent directory tree, and a branch in the directory tree is a subdirectory tree. For example, key-value pairs can be organized into a tree structure by identifying the parent directory of a file or directory in the metadata key-value pairs, forming a distributed directory tree, thus representing the flat key-value pairs in the form of a directory tree. Specifically, on one hand, the snapshot server can receive snapshot creation requests for directory subtrees, obtain a snapshot of the directory tree containing the directory subtree according to the snapshot creation request, and use the snapshot of the directory tree as the snapshot of the directory subtree; on the other hand, the snapshot server can determine the snapshot of the directory subtree, identify the deleted directories and / or files, determine whether the deleted directories and / or files belong to the directory subtree, and if it is determined that they do not belong, delete the deleted directories and / or files from the snapshot of the directory subtree. The snapshot creation request can be issued by the user, by the relevant management server in the cloud, or by the internal program of the snapshot server as needed; this specification does not impose any restrictions on this.
[0041] In the above application scenarios, the snapshot processing method provided in the embodiments of this specification directly uses the snapshot of the directory tree as the snapshot of the directory subtree, and reclaims the garbage data of files that do not belong to the directory subtree in the snapshot, ensuring that the snapshot of the directory subtree can be easily created, while avoiding the problems of garbage data occupying storage and metadata amplification.
[0042] Furthermore, since garbage data from snapshots is reclaimed through a snapshot server in the cloud backend, the frontend providing the service does not need to be aware of this; garbage collection and frontend services are asynchronous and do not affect each other. In practical applications, the snapshot server can be implemented as one or more cloud servers. For example, to accelerate snapshot garbage collection, in one or more embodiments of this specification, the snapshot server can be implemented as a garbage collection trigger server and a garbage collection processing server. The garbage collection trigger server is configured to distribute multiple garbage collection processing tasks to multiple garbage collection processing servers when it is determined that multiple deleted directories and / or files exist. These multiple garbage collection processing tasks represent garbage collection of the multiple deleted directories and / or files, with different garbage collection processing tasks corresponding to different directories and / or files. The garbage collection processing server is configured to, in response to receiving the garbage collection processing tasks, perform garbage collection on the snapshot of the directory subtree according to the snapshot processing method provided in the embodiments of this specification.
[0043] It should be noted that, Figure 1 The application scenarios illustrated are for illustrative purposes only and do not constitute a limitation on the methods provided in the embodiments of this specification. For example, according to the methods provided in the embodiments of this specification, the cloud disk can be a cloud disk for a cloud server that provides arbitrary cloud computing capabilities. The cloud server can be a distributed server cluster including multiple servers, or it can be a single server. The services provided by the cloud server can include: cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks (CDNs), and basic cloud computing services such as big data and artificial intelligence platforms.
[0044] See Figure 2 , Figure 2 A flowchart of a method for processing a snapshot according to an embodiment of this specification is shown, specifically including the following steps.
[0045] Step 202: Receive snapshot creation requests for directory subtrees.
[0046] The snapshot creation request is a request to instruct the creation of a snapshot of the metadata of a specified directory subtree within a specified directory tree. For example, the snapshot creation request may carry path information of the directory subtree, which is the path information of the root node of the directory subtree within the directory tree.
[0047] For example, such as Figure 3The diagram illustrates a directory tree where circles represent directories, squares represent files, and the topmost " / " represents the root node, or root directory. Assume the snapshot creation request specifies the root node " / a" of the subtree. Then, the directory subtree corresponding to the snapshot to be created is... Figure 3 The directory subtree 302 is shown in the diagram.
[0048] The sender of the snapshot creation request is not limited. For example, the cloud allocates a certain amount of cloud disk space to a user as the user's cloud photo album space. For example, such as... Figure 3 The directory subtree 302 shown is used to manage the metadata corresponding to the files in the user's cloud photo album space. To prevent data security issues in case of accidental file deletion by the user or system failure, the cloud can configure the snapshot server to periodically initiate snapshot creation requests for the directory subtree 302 corresponding to the user's cloud photo album space.
[0049] Step 204: Obtain a snapshot of the directory tree containing the directory subtree according to the snapshot creation request.
[0050] There are no restrictions on how the snapshot of the directory tree is obtained. For example, in some application scenarios, a cloud disk corresponds to an independent directory tree. In this case, any method of creating a snapshot, such as referencing or copying, can be used to easily create a snapshot of an independent directory tree.
[0051] Step 206: Take a snapshot of the directory tree as a snapshot of the directory subtree.
[0052] For example, such as Figure 3 The directory tree 304 shown can be directly used as a snapshot of the directory subtree 302. Therefore, the snapshot of the directory subtree 302 can contain files and / or directories that do not belong to the directory subtree 302.
[0053] Since this method receives a snapshot creation request for a directory subtree, obtains a snapshot of the directory tree in which the directory subtree is located based on the snapshot creation request, and directly uses the snapshot of the directory tree as the snapshot of the directory subtree, the creation of snapshots for the directory subtree is simplified, effectively improving the performance of cloud services.
[0054] Corresponding to the above method embodiments, this specification also provides embodiments of an apparatus for processing snapshots. Figure 4 A schematic diagram of a snapshot processing apparatus according to one embodiment of this specification is shown. Figure 4 As shown, the device includes:
[0055] Request receiving module 402 is configured to receive snapshot creation requests for directory subtrees.
[0056] The snapshot acquisition module 404 is configured to acquire a snapshot of the directory tree in which the directory subtree is located based on the snapshot creation request, and use the snapshot of the directory tree as the snapshot of the directory subtree.
[0057] The above is an illustrative scheme of a snapshot processing apparatus according to this embodiment. It should be noted that the technical solution of this snapshot processing apparatus and the technical solution of the snapshot processing method described above belong to the same concept. For details not described in detail in the technical solution of the snapshot processing apparatus, please refer to the description of the technical solution of the snapshot processing method described above.
[0058] See Figure 5 , Figure 5 A flowchart of a method for processing a snapshot according to another embodiment of this specification is shown, specifically including the following steps.
[0059] Step 502: Determine a snapshot of the directory subtree, wherein the snapshot of the directory subtree is generated based on the snapshot of the directory tree to which the directory subtree resides.
[0060] Determining the snapshot of the directory subtree can include: obtaining a snapshot of the directory tree as a snapshot of the directory subtree, or checking to determine if a snapshot of the directory subtree already exists, or any other determination method. By determining the snapshot of the directory subtree, the snapshot can be used as the processing object in subsequent processing steps.
[0061] In practical applications, the specific method for generating snapshots of directory subtrees is not limited. In one embodiment, a snapshot of the directory tree can be directly used as a snapshot of the directory subtree. In another embodiment, depending on the actual application scenario, a portion of the snapshot of the directory tree can be obtained as needed. This portion can include the directory subtree and other parts besides the directory subtree. Since the obtained snapshot of the directory subtree includes not only the directory subtree but also directories and / or files outside the directory subtree, garbage collection of the snapshot of the directory subtree can be performed according to the method provided in the embodiments of this specification.
[0062] For example, such as Figure 3 The directory tree shown is actually snapshotted when a snapshot of the directory subtree 302 is taken, which is the entire directory tree 304. Assuming that a snapshot S is obtained, the snapshot S of the directory subtree 302 is also the snapshot S of the directory tree 304.
[0063] Step 504: Determine the directories and / or files to be deleted, wherein the metadata corresponding to the deleted directories and / or files belongs to the directory tree.
[0064] The deleted directories and / or files may be directories and / or files that have already been deleted.
[0065] For example, such as Figure 3 As shown in the directory tree, suppose the user deletes the file " / b / F" on the cloud drive. The metadata of " / b / F" belongs to the existing snapshot S in the directory tree 304. Therefore, the metadata and data space occupied by " / b / F" need to be reclaimed to avoid wasting space.
[0066] Step 506: Determine whether the metadata corresponding to the deleted directory and / or file belongs to the directory subtree.
[0067] The specific implementation of the judgment is not limited. For example, it can be determined by comparing the path of the metadata corresponding to the directory and / or file to be deleted with the path of the directory subtree. As another example, it can be determined by traversing the nodes on the path of the directory subtree and comparing the information of the files and / or directories corresponding to the traversed nodes with the information of the directory and / or file to be deleted.
[0068] Step 508: If it is determined that the directory and / or file does not belong, delete the metadata corresponding to the deleted directory and / or file from the snapshot of the directory subtree.
[0069] For example, if a snapshot of the directory subtree has been obtained, the metadata corresponding to the deleted directory and / or file can be directly deleted from the snapshot of the directory subtree. As another example, if it is determined that a snapshot of the directory subtree already exists, deletion can be performed by sending a deletion command to the storage module that stores the snapshot of the directory subtree to delete the corresponding metadata.
[0070] According to the above embodiments, directories and / or files that are not part of the directory subtree and have been deleted can be stripped from the snapshot, garbage data is reclaimed, storage space is reduced, and cloud service performance is effectively improved.
[0071] In one or more embodiments described herein, in order to improve the efficiency of determining whether the metadata corresponding to a deleted directory and / or file belongs to the directory subtree, the determination is achieved by comparing the paths of the two. Specifically, determining whether the metadata corresponding to a deleted directory and / or file belongs to the directory subtree includes:
[0072] Obtain the path of the metadata corresponding to the deleted directory and / or file in the directory tree;
[0073] Obtain the path of the directory subtree within the directory tree;
[0074] Based on the path of the directory subtree in the directory tree and the path of the metadata corresponding to the deleted directory and / or file in the directory tree, determine whether the metadata corresponding to the deleted directory and / or file belongs to the directory subtree.
[0075] The implementation method for obtaining the path of the metadata corresponding to the deleted directory and / or file in the directory tree is not limited. For example, a snapshot of the directory subtree can be used to traverse the nodes layer by layer from the node corresponding to the deleted directory and / or file towards the root node of the directory tree to obtain the path of the metadata corresponding to the deleted directory and / or file in the directory tree. As another example, a snapshot of the directory subtree can be used to traverse each leaf node from the root node towards that leaf node; when the node corresponding to the deleted directory and / or file is reached, the path from the root node to that node can be obtained.
[0076] Understandably, by starting directly from the node corresponding to the deleted directory and / or file and traversing the nodes layer by layer from bottom to top towards the root node of the directory tree, the path of the metadata corresponding to the deleted directory and / or file in the directory tree can be directly obtained, reducing the traversal of irrelevant nodes and resulting in higher processing efficiency.
[0077] In determining whether the metadata corresponding to the deleted directory and / or file belongs to the directory subtree, it can be determined whether the path of the directory subtree is a prefix of the path of the metadata corresponding to the deleted directory and / or file in the directory tree. If it is a prefix, it can be determined that the metadata corresponding to the deleted directory and / or file belongs to the directory subtree; otherwise, it can be determined that the metadata corresponding to the deleted directory and / or file does not belong to the directory subtree, thus making it easier to obtain the determination result.
[0078] For example, such as Figure 3 The directory tree diagram shown illustrates a scenario where the deleted file is "F". Since the path " / a" of the directory subtree is not a prefix of the path " / b / F" of the deleted file F, the deleted file F does not actually belong to directory subtree 302. In this case, the metadata of file F should be deleted from snapshot S. Similarly, if the deleted file is "E", since the path " / a" of the directory subtree is a prefix of the path " / a / E" of the deleted file E, the deleted file E belongs to directory subtree 302. In this case, based on the immutable nature of snapshots where directory subtrees need to maintain their own content, the metadata of file E cannot be deleted from snapshot S.
[0079] To accelerate the retrieval of the path of the metadata corresponding to the deleted directory and / or file in the directory tree, a Least Recently Used (LRU) algorithm can be used to store the correspondence between each directory and its parent directory obtained from the snapshot of the directory subtree. This way, when the path of a directory and / or file needs to be retrieved, the path can be directly obtained based on the existing correspondence in the cache. If the correspondence in the cache is insufficient to obtain the path, it can be supplemented by obtaining the relevant correspondence from the snapshot, thereby obtaining the path of a directory and / or file as quickly as possible. Therefore, in one or more embodiments of this specification, the step of using the snapshot of the directory subtree to traverse nodes layer by layer from the node corresponding to the deleted directory and / or file towards the root node of the directory tree to obtain the path of the metadata corresponding to the deleted directory and / or file in the directory tree includes:
[0080] The correspondence between each directory in the directory tree and its parent directory is obtained from the cache, and the cache stores the correspondence obtained from the snapshot of the directory subtree based on the least recently used algorithm;
[0081] Based on the correspondence obtained from the cache, starting from the node corresponding to the deleted directory and / or file, the nodes are traversed layer by layer towards the root node of the directory tree to obtain the path of the metadata corresponding to the deleted directory and / or file in the directory tree.
[0082] The Least Recently Used algorithm refers to the principle that data that has not been used for a long time has a low probability of being used in the future. When new data comes in, this long-term unused data is replaced from the cache first.
[0083] In one or more embodiments of this specification, in order to obtain the path of the directory subtree as quickly as possible, obtaining the path of the directory subtree in the directory tree includes:
[0084] The path of the directory subtree in the directory tree is obtained from the snapshot metadata corresponding to the snapshot of the directory subtree. The path of the directory subtree in the directory tree is saved to the snapshot metadata when creating a snapshot of the directory subtree according to the specified path.
[0085] For example, when a snapshot creation request for a directory subtree is received, the request specifies the path to the directory subtree. When a snapshot of the directory tree is obtained as a snapshot of the directory subtree based on the snapshot creation request, the path of the directory subtree is saved in the metadata of the directory subtree snapshot. Therefore, when using the snapshot, the scope of the directory subtree can be clearly defined based on the snapshot's metadata. Thus, when the path to the directory subtree is needed, it can be obtained from the snapshot's metadata, improving the speed of path retrieval.
[0086] Of course, the method provided in the embodiments of this specification can also obtain the path of the directory subtree in other ways. For example, in one or more other embodiments, a subtree root node marker can be added to the metadata corresponding to the root node of the directory subtree in the snapshot of the directory subtree, thereby clarifying the coverage of the directory subtree. Accordingly, in this embodiment, the path of the directory subtree can be obtained by traversing the nodes in the snapshot of the directory subtree to find the node with the subtree root node marker.
[0087] In one or more embodiments of this specification, batch deletion is also used to improve garbage collection efficiency. Specifically, if it is determined that the file does not belong to the directory, the metadata corresponding to the deleted directory and / or file is removed from the snapshot of the directory subtree, including:
[0088] If it is determined that the file does not belong to the specified directory, the information of the deleted directory and / or file is added to the data set to be deleted.
[0089] When the amount of file data corresponding to the data set to be deleted is greater than or equal to a preset garbage collection threshold, the metadata corresponding to the information of the directories and / or files in the data set to be deleted is deleted from the snapshot of the directory subtree.
[0090] Additionally, data successfully deleted from the snapshot of the directory subtree can also be deleted from the set of data to be deleted.
[0091] The file data volume can refer to the number of files or the file size. Correspondingly, the preset garbage collection threshold can refer to a file number threshold or a file data size threshold.
[0092] In the above embodiments, by presetting the garbage collection range, the maximum proportion of garbage collection is controlled, and the amplification of metadata is controllable, thereby ensuring the effectiveness of caching and reducing the impact of garbage data on performance.
[0093] In one or more embodiments of this specification, a tag may be added to the file's metadata to identify whether the file has been removed from the snapshot of the directory subtree, thereby avoiding garbage collection of identical files and accelerating the determination of whether deleted directories and / or files need garbage collection. Specifically, after determining the deleted directories and / or files, the method further includes:
[0094] Determine whether the corresponding snapshot deletion flag has been set in the metadata of the deleted directory and / or file. The snapshot deletion flag is used to indicate that the metadata corresponding to the file has been deleted from the snapshot of the directory subtree.
[0095] If the snapshot has been marked as deleted, end the garbage collection process for the directory subtree in this round;
[0096] After deleting the metadata corresponding to the deleted directory and / or file from the snapshot of the directory subtree, the method further includes:
[0097] Add the snapshot deleted marker to the metadata corresponding to the deleted file.
[0098] In the above embodiments, although the substantive content of the metadata corresponding to the deleted directory and / or file (i.e., the mapping relationship between logical addresses and physical addresses) is deleted from the snapshot of the directory subtree, the metadata organization structure corresponding to the deleted directory and / or file can still exist. Therefore, a snapshot deletion flag can be added to the metadata organization structure to indicate that the substantive content of the metadata corresponding to the file has been deleted from the snapshot of the directory subtree. Accordingly, when determining whether the corresponding snapshot deletion flag has been set in the metadata corresponding to the deleted directory and / or file, the determination can also be based on the snapshot.
[0099] Additionally, the garbage collection process for the directory subtree can be terminated if it is determined that the metadata corresponding to the deleted directory and / or file belongs to the directory tree.
[0100] The deleted marker can be represented by any character such as letters, numbers, or specified symbols, and this specification does not impose any restrictions on this.
[0101] The following is in conjunction with the appendix Figure 6 The method for processing snapshots provided in this specification is as follows: Figure 3 Taking the application in a snapshot of the shown directory subtree as an example, the method for processing snapshots will be further explained. Assuming that according to... Figure 3 The directory tree shown is defined as follows: S is a snapshot of subtree P1 (P1 is the path " / a" in the subtree); M is a set of files that were not part of S after its creation and were subsequently deleted; and F is a single deleted file. Combined with... Figure 3 Taking the directory tree shown as an example, Figure 6 A flowchart illustrating the process of a snapshot processing method provided in an embodiment of this specification is shown, specifically including the following steps.
[0102] Step 602: Check if file F has been marked with the corresponding snapshot deleted flag. If it has, proceed to step 612.
[0103] Step 604: Based on snapshot S, traverse the directory tree from bottom to top to obtain the full path Pf of F.
[0104] Step 606: Check if P1 is a prefix of Pf. If it is, it means that F belongs to S. Proceed to step 608. Otherwise, jump to step 612.
[0105] Step 608: Add the file unique identifier of F to set M, calculate whether the size of all files in M exceeds the preset garbage collection threshold T. If it exceeds, proceed to step 610; otherwise, jump to step 612.
[0106] Step 610: Traverse all files in M, delete the files from snapshot S in batches, and remove them from M after successful deletion.
[0107] Step 612: End this round of waste recycling process.
[0108] The garbage collection process implemented through the above processing steps can reclaim garbage data in snapshots of directory subtrees in batches, reducing storage space waste and effectively improving cloud service performance.
[0109] Corresponding to the above method embodiments, this specification also provides another embodiment of a snapshot processing apparatus. Figure 7 A schematic diagram of a snapshot processing apparatus according to another embodiment of this specification is shown. Figure 7 As shown, the device includes:
[0110] The subtree determination module 702 is configured to determine a snapshot of a directory subtree, the snapshot of which is generated based on a snapshot of the directory tree to which the directory subtree resides.
[0111] The file determination module 704 is configured to determine the directories and / or files to be deleted, wherein the metadata corresponding to the deleted directories and / or files belongs to the directory tree.
[0112] The file determination module 706 is configured to determine whether the metadata corresponding to the deleted directory and / or file belongs to the directory subtree.
[0113] The space reclamation module 708 is configured to delete the metadata corresponding to the deleted directory and / or file from the snapshot of the directory subtree if the file determination module 706 determines that it does not belong to the directory.
[0114] In one or more embodiments of this specification, the file determination module includes:
[0115] The file path acquisition submodule is configured to acquire the path of the metadata corresponding to the deleted directory and / or file in the directory tree;
[0116] The subtree path acquisition submodule is configured to acquire the path of the directory subtree in the directory tree;
[0117] The path determination submodule is configured to determine whether the metadata corresponding to the deleted directory and / or file belongs to the directory subtree based on the path of the directory subtree in the directory tree and the path of the metadata corresponding to the deleted directory and / or file in the directory tree.
[0118] In one or more embodiments of this specification, the file path acquisition submodule is configured to use a snapshot of the directory subtree to traverse nodes layer by layer from the node corresponding to the deleted directory and / or file toward the root node of the directory tree to obtain the path of the metadata corresponding to the deleted directory and / or file in the directory tree.
[0119] In one or more embodiments of this specification, the file path acquisition submodule is configured to obtain the correspondence between each directory and its parent directory in the directory tree from a cache. The cache stores the correspondence obtained from snapshots of the directory subtree based on a least recently used algorithm. Based on the correspondence obtained from the cache, starting from the node corresponding to the deleted directory and / or file, the node is traversed layer by layer towards the root node of the directory tree to obtain the path of the metadata corresponding to the deleted directory and / or file in the directory tree.
[0120] In one or more embodiments of this specification, the subtree path acquisition submodule is configured to obtain the path of the directory subtree in the directory tree from the snapshot metadata corresponding to the snapshot of the directory subtree, and the path of the directory subtree in the directory tree is saved to the snapshot metadata when the snapshot of the directory subtree is created according to the specified path.
[0121] In one or more embodiments of this specification, the space reclamation module is configured to add the information of the deleted directory and / or file to the data set to be deleted if it is determined that the directory and / or file does not belong to the data set to be deleted, and when the amount of file data corresponding to the data set to be deleted is greater than or equal to a preset garbage collection threshold, delete the metadata corresponding to the information of the directory and / or file in the data set to be deleted from the snapshot of the directory subtree.
[0122] In one or more embodiments of this specification, the apparatus further includes:
[0123] The tagging and judgment module is configured to determine whether the corresponding snapshot deleted tag has been set in the metadata of the deleted directory and / or file. The snapshot deleted tag is used to indicate that the metadata of the corresponding file has been deleted from the snapshot of the directory subtree. If the snapshot deleted tag has been set, the garbage collection process of the directory subtree in this round ends.
[0124] The tag-adding module is configured to add a snapshot-deleted tag to the metadata corresponding to the deleted file after the deleted directory and / or file is removed from the snapshot of the directory subtree.
[0125] The above is an illustrative scheme of a snapshot processing apparatus according to this embodiment. It should be noted that the technical solution of this snapshot processing apparatus and the technical solution of the snapshot processing method described above belong to the same concept. For details not described in detail in the technical solution of the snapshot processing apparatus, please refer to the description of the technical solution of the snapshot processing method described above.
[0126] This specification also provides system implementation examples for processing snapshots. Figure 8 A schematic diagram of the structure of a snapshot processing system provided in one embodiment of this specification is shown. Figure 8 As shown, the system includes:
[0127] Garbage collection triggering server 802 is configured to distribute multiple garbage collection processing tasks to multiple garbage collection processing servers 804 when it is determined that multiple deleted directories and / or files exist. The multiple garbage collection processing tasks are used to represent garbage collection of the multiple deleted directories and / or files, and different garbage collection processing tasks correspond to different directories and / or files.
[0128] The garbage collection server 804 is configured to perform garbage collection on the snapshot of the directory subtree in response to receiving the garbage collection task, according to the snapshot processing method as described in any embodiment of this specification.
[0129] In this system embodiment, garbage collection for multiple deleted directories and / or files can be performed concurrently on multiple servers, thus ensuring that files that do not belong to the directory subtree are quickly removed from the directory subtree snapshot after deletion.
[0130] Figure 9 A structural block diagram of a computing device 900 according to one embodiment of this specification is shown. The components of the computing device 900 include, but are not limited to, a memory 910 and a processor 920. The processor 920 is connected to the memory 910 via a bus 930, and a database 950 is used to store data.
[0131] The computing device 900 also includes an access device 940, which enables the computing device 900 to communicate via one or more networks 960. Examples of these networks include Public Switched Telephone Network (PSTN), Local Area Network (LAN), Wide Area Network (WAN), Personal Area Network (PAN), or combinations of communication networks such as the Internet. The access device 940 may include one or more of any type of wired or wireless network interface (e.g., a network interface card (NIC)), such as an IEEE 802.11 Wireless Local Area Network (WLAN) wireless interface, a Wi-MAX (Worldwide Interoperability for Microwave Access) interface, an Ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a Bluetooth interface, or a Near Field Communication (NFC) interface.
[0132] In one embodiment of this specification, the above-described components of the computing device 900 and Figure 9 Other components, not shown, can also be connected to each other, for example, via a bus. It should be understood that... Figure 9 The block diagram of the computing device shown is for illustrative purposes only and is not intended to limit the scope of this specification. Those skilled in the art can add or replace other components as needed.
[0133] The computing device 900 can be any type of stationary or mobile computing device, including mobile computers or mobile computing devices (e.g., tablet computers, personal digital assistants, laptop computers, notebook computers, netbooks, etc.), mobile phones (e.g., smartphones), wearable computing devices (e.g., smartwatches, smart glasses, etc.) or other types of mobile devices, or stationary computing devices such as desktop computers or personal computers (PCs). The computing device 900 can also be a mobile or stationary server.
[0134] The processor 920 is configured to execute the following computer-executable instructions, which, when executed by the processor, implement the steps of the above-described method for processing snapshots.
[0135] The above is an illustrative scheme of a computing device according to this embodiment. It should be noted that the technical solution of this computing device and the technical solution of the above-described snapshot processing method belong to the same concept. For details not described in detail in the technical solution of the computing device, please refer to the description of the technical solution of the above-described snapshot processing method.
[0136] An embodiment of this specification also provides a computer-readable storage medium storing computer-executable instructions that, when executed by a processor, implement the steps of the above-described snapshot processing method.
[0137] The above is an illustrative embodiment of a computer-readable storage medium. It should be noted that the technical solution of this storage medium and the technical solution of the snapshot processing method described above belong to the same concept. Details not described in detail in the technical solution of the storage medium can be found in the description of the technical solution of the snapshot processing method described above.
[0138] An embodiment of this specification also provides a computer program, wherein when the computer program is executed in a computer, it causes the computer to perform the steps of the above-described method for processing snapshots.
[0139] The above is an illustrative example of a computer program according to this embodiment. It should be noted that the technical solution of this computer program and the technical solution of the above-described snapshot processing method belong to the same concept. Details not described in detail in the computer program's technical solution can be found in the description of the above-described snapshot processing method's technical solution.
[0140] The foregoing has described specific embodiments of this specification. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps recited in the claims may be performed in a different order than that shown in the embodiments and may still achieve the desired result. Furthermore, the processes depicted in the drawings do not necessarily require the specific or sequential order shown to achieve the desired result. In some embodiments, multitasking and parallel processing are possible or may be advantageous.
[0141] The computer instructions include computer program code, which may be in the form of source code, object code, executable file, or some intermediate form. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording media, USB flash drive, portable hard drive, magnetic disk, optical disk, computer memory, read-only memory (ROM), random access memory (RAM), electrical carrier signals, telecommunication signals, and software distribution media, etc. It should be noted that the content included in the computer-readable medium may be appropriately added to or subtracted according to the requirements of legislation and patent practice in the jurisdiction. For example, in some jurisdictions, according to legislation and patent practice, computer-readable media may not include electrical carrier signals and telecommunication signals.
[0142] It should be noted that, for the sake of simplicity, the foregoing method embodiments are all described as a series of actions. However, those skilled in the art should understand that the embodiments in this specification are not limited to the described order of actions, because according to the embodiments in this specification, some steps can be performed in other orders or simultaneously. Furthermore, those skilled in the art should also understand that the embodiments described in this specification are all preferred embodiments, and the actions and modules involved are not necessarily essential to the embodiments in this specification.
[0143] In the above embodiments, the descriptions of each embodiment have different focuses. For parts not described in detail in a certain embodiment, please refer to the relevant descriptions of other embodiments.
[0144] The preferred embodiments disclosed above are merely illustrative of this specification. The optional embodiments do not exhaustively describe all details, nor do they limit the invention to the specific implementations described. Clearly, many modifications and variations can be made based on the embodiments described herein. These embodiments are selected and specifically described in this specification to better explain the principles and practical applications of the embodiments, thereby enabling those skilled in the art to better understand and utilize this specification. This specification is limited only by the claims and their full scope and equivalents.
Claims
1. A method for processing snapshots, comprising: Determine a snapshot of a directory subtree, wherein the snapshot of the directory subtree is generated based on a snapshot of the directory tree to which the directory subtree resides; Identify the directories and / or files to be deleted, wherein the metadata corresponding to the deleted directories and / or files belongs to the directory tree; Determine whether the path of the directory subtree in the directory tree is consistent with the path of the metadata corresponding to the deleted directory and / or file in the directory tree, wherein whether the paths are consistent is determined based on whether the path of the directory subtree is a prefix of the path of the metadata corresponding to the deleted directory and / or file in the directory tree. If it is determined that the directory and / or file does not belong, the information of the deleted directory and / or file is added to the data set to be deleted. When the amount of file data corresponding to the data set to be deleted is greater than or equal to a preset garbage collection threshold, the metadata corresponding to the information of the directory and / or file in the data set to be deleted is deleted from the snapshot of the directory subtree.
2. The method according to claim 1, wherein determining whether the metadata corresponding to the deleted directory and / or file belongs to the directory subtree includes: Obtain the path of the metadata corresponding to the deleted directory and / or file in the directory tree; Obtain the path of the directory subtree within the directory tree; Based on the path of the directory subtree in the directory tree and the path of the metadata corresponding to the deleted directory and / or file in the directory tree, determine whether the metadata corresponding to the deleted directory and / or file belongs to the directory subtree.
3. The method according to claim 2, wherein obtaining the path of the metadata corresponding to the deleted directory and / or file in the directory tree includes: Using the snapshot of the directory subtree, starting from the node corresponding to the deleted directory and / or file, traverse the nodes layer by layer towards the root node of the directory tree to obtain the path of the metadata corresponding to the deleted directory and / or file in the directory tree.
4. The method according to claim 3, wherein the step of using a snapshot of the directory subtree to traverse nodes layer by layer from the node corresponding to the deleted directory and / or file towards the root node of the directory tree to obtain the path of the metadata corresponding to the deleted directory and / or file in the directory tree includes: The correspondence between each directory in the directory tree and its parent directory is obtained from the cache, and the cache stores the correspondence obtained from the snapshot of the directory subtree based on the least recently used algorithm; Based on the correspondence obtained from the cache, starting from the node corresponding to the deleted directory and / or file, the nodes are traversed layer by layer towards the root node of the directory tree to obtain the path of the metadata corresponding to the deleted directory and / or file in the directory tree.
5. The method according to claim 2, wherein obtaining the path of the directory subtree in the directory tree comprises: The path of the directory subtree in the directory tree is obtained from the snapshot metadata corresponding to the snapshot of the directory subtree. The path of the directory subtree in the directory tree is saved to the snapshot metadata when creating a snapshot of the directory subtree according to the specified path.
6. The method of claim 1, further comprising, after determining the directory and / or file to be deleted: Determine whether the corresponding snapshot deletion flag has been set in the metadata of the deleted directory and / or file. The snapshot deletion flag is used to indicate that the metadata of the corresponding file has been deleted from the snapshot of the directory subtree. If the snapshot has been marked as deleted, end the garbage collection process for the directory subtree in this round; After deleting the deleted directories and / or files from the snapshot of the directory subtree, the method further includes: Add the snapshot deleted marker to the metadata corresponding to the deleted file.
7. A method for processing snapshots, comprising: Receive a snapshot creation request for a directory subtree, wherein the snapshot creation request includes the path information of the root node of the directory subtree in the directory tree; Based on the snapshot creation request, obtain a snapshot of the directory tree in which the directory subtree is located; A snapshot of the directory tree is used as a snapshot of the directory subtree.
8. A system for processing snapshots, comprising: A garbage collection trigger server is configured to distribute multiple garbage collection tasks to multiple garbage collection servers when it is determined that multiple deleted directories and / or files exist. The multiple garbage collection tasks are used to represent garbage collection of the multiple deleted directories and / or files, and different garbage collection tasks correspond to different directories and / or files. A garbage collection server is configured to, in response to receiving the garbage collection task, perform garbage collection on a snapshot of the directory subtree according to the method for processing snapshots as described in any one of claims 1-6.
9. A computing device, comprising: Memory and processor; The memory is used to store computer-executable instructions, and the processor is used to execute the computer-executable instructions, which, when executed by the processor, implement the steps of the method for processing snapshots according to any one of claims 1 to 7.
10. A computer-readable storage medium storing computer-executable instructions that, when executed by a processor, implement the steps of the method for processing a snapshot as described in any one of claims 1 to 7.
Citation Information
Patent Citations
Techniques for snapshotting scalable multitier storage structures
WO2020190669A1