Data backup method, data recovery method, storage medium and program product
By designing a multi-version control scheme at the storage gateway level, the flexibility and compatibility issues of object storage products when changing backend storage layers are resolved. This enables multi-version management of data objects and index objects, improving data consistency and upload order.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- CHINA TELECOM CLOUD TECH CO LTD
- Filing Date
- 2025-12-04
- Publication Date
- 2026-06-11
AI Technical Summary
Existing object storage products lack flexibility when changing backend storage layers, especially due to inconsistent version control functionality, which makes compatibility difficult to achieve.
A multi-version control scheme is designed at the storage gateway level. By setting a unique storage name for data objects and generating index data, multi-version management of data objects and index objects is achieved, which is independent of the multi-version control function of object storage products.
It enhances the flexibility and adaptability of the storage gateway, enabling it to be compatible with various object storage products, resolving inconsistencies in multi-version control functions, and ensuring data consistency in case of crashes and the sequential nature of concurrent uploads.
Smart Images

Figure CN2025139909_11062026_PF_FP_ABST
Abstract
Description
Data backup methods, data recovery methods, storage media and software products
[0001] Related applications
[0002] This application claims priority to Chinese patent application filed on December 4, 2024, application number 202411769886.6, entitled "Object Storage Management Method, Apparatus, Gateway Device and Storage System", the entire contents of which are incorporated herein by reference. Technical Field
[0003] This application relates to the field of data storage technology, and in particular to an object storage management method, apparatus, gateway device, and storage system. Background Technology
[0004] Object storage is a storage method that uses objects as the basic storage unit, managing data through objects. Currently, various object storage products have been designed by different vendors. Most object storage products share a common feature: support for version control. This means that multiple versions of a single object can exist simultaneously, and operations such as downloading and deleting can be performed on a specific version of that object by specifying version information.
[0005] To transfer local data to object storage products, manufacturers design matching storage gateways, using their own object storage as the backend storage layer to manage and transfer object data. However, this approach limits user flexibility, as users cannot easily switch backend storage products to suit their needs. If storage gateways were compatible with all object storage products, allowing users to choose the appropriate backend storage layer, it would significantly improve the flexibility and scalability of object storage. However, while some object storage products on the market have version control capabilities, others lack this feature, making product compatibility challenging. Summary of the Invention
[0006] According to various embodiments of this application, an object storage management method, apparatus, gateway device, and storage system are provided.
[0007] This application provides an object storage management method, which is applied to a storage gateway and includes:
[0008] Receive the current data and write it to the corresponding block device; the storage gateway includes multiple block devices, and each block device divides data objects according to a continuous and fixed-length offset range;
[0009] Set a unique storage name for the current data to form the data object to be transmitted;
[0010] Under the condition that the preset upload conditions are met, the data objects to be uploaded in each device are uploaded to the object storage system;
[0011] After the upload is complete, index data is generated based on the block device address and unique storage name of each uploaded data object; the index data carries version information or time information; and
[0012] Update the index file based on the index data.
[0013] In one embodiment, the index file includes a location index area and a version index area. The location index area records the location information of each transmitted data object in the block device, and the version index area stores multiple index data. The method further includes:
[0014] Receive data read request;
[0015] Read data from the corresponding target block device according to the data read request;
[0016] If no data is read locally, the location index area is searched based on the location information of the target block device to determine whether the target data object corresponding to the target block device is the transmitted data object.
[0017] If the target data object is determined to be a transmitted data object, the unique storage name of the target data object is determined from the version index area;
[0018] Data is retrieved from the object storage system based on a uniquely identified storage name; and
[0019] The acquired data is returned to the sender of the data read request.
[0020] In one embodiment, after uploading the data objects to be transmitted from each device to the object storage system, the method further includes:
[0021] For each data object to be uploaded, if the current data object has been uploaded, the corresponding data object information is recorded in the progress status file.
[0022] In one embodiment, the method further includes:
[0023] If a failure occurs, the uploading of data objects to be uploaded in each device will be stopped;
[0024] Once the fault has been cleared, read the progress status file to obtain the data object information of the transmitted data objects; and
[0025] Based on the obtained data object information, continue uploading the data objects to be transmitted in each device.
[0026] In one embodiment, after updating the index file based on the index data, the method further includes:
[0027] Generate an index object based on the index data; and
[0028] Upload the index object to the object storage system.
[0029] In one embodiment, the method further includes:
[0030] Based on the version information or time information carried by each index data in the index file, determine the old version index data;
[0031] Delete the data objects corresponding to the old version of the index data in the object storage system; and
[0032] Delete the old version of the index object stored in the object storage system.
[0033] In one embodiment, the method further includes:
[0034] In the event of local data corruption, scan all index objects stored in the object storage system;
[0035] Based on the version or time information carried by each index object, determine at least one latest index object;
[0036] Read the unique storage name of the data object to be recovered from at least one of the latest index objects;
[0037] Based on the unique storage name read, retrieve the target data from the object storage system; and
[0038] Perform local data recovery based on the acquired target data.
[0039] This application also provides an object storage management device, which is applied to a storage gateway and includes:
[0040] The receiving module is used to receive the current data and write the current data to the corresponding block device; the storage gateway includes multiple block devices, and each block device divides data objects according to a continuous and fixed-length offset range;
[0041] The naming module is used to set a unique storage name for the current data, forming the data object to be transmitted.
[0042] The upload module is used to upload data objects to be uploaded from each device to the object storage system, provided that preset upload conditions are met; and
[0043] The index generation module is used to generate index data based on the block device address and unique storage name of each uploaded data object after the upload is completed. The index data carries version information or time information. The index file is updated based on the index data.
[0044] This application also provides a gateway device, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the steps of the method described above.
[0045] This application also provides a storage system, the storage system comprising: a storage gateway and an object storage system; wherein:
[0046] The storage gateway is used to perform the steps of the method described above.
[0047] Details of one or more embodiments of this application are set forth in the following drawings and description. Other features, objects, and advantages of this application will become apparent from the specification, drawings, and claims.
[0048] According to various embodiments of this application, a data backup method, apparatus, computer device, computer-readable storage medium, and computer program product are also provided.
[0049] This application provides a data backup method, including:
[0050] According to the preset data organization method, the data to be backed up in the block storage system is processed to obtain standard format objects; and
[0051] Back up the standard format object to a cloud object storage system;
[0052] The standard format object includes at least one of the following: a data object, an index object, and a fence object.
[0053] This application also provides a data backup device, including:
[0054] The processing module is used to process the data to be backed up in the block storage system according to a preset data organization method to obtain standard format objects; and
[0055] The backup module is used to back up the standard format objects to a cloud object storage system;
[0056] The standard format object includes at least one of the following: a data object, an index object, and a fence object.
[0057] This application also provides a computer device, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to perform the following steps:
[0058] According to the preset data organization method, the data to be backed up in the block storage system is processed to obtain standard format objects; and
[0059] Back up the standard format object to a cloud object storage system;
[0060] The standard format object includes at least one of the following: a data object, an index object, and a fence object.
[0061] This application also provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, performs the following steps:
[0062] According to the preset data organization method, the data to be backed up in the block storage system is processed to obtain standard format objects; and
[0063] Back up the standard format object to a cloud object storage system;
[0064] The standard format object includes at least one of the following: a data object, an index object, and a fence object.
[0065] This application also provides a computer program product, including a computer program that, when executed by a processor, performs the following steps:
[0066] According to the preset data organization method, the data to be backed up in the block storage system is processed to obtain standard format objects; and
[0067] Back up the standard format object to a cloud object storage system;
[0068] The standard format object includes at least one of the following: a data object, an index object, and a fence object.
[0069] Details of one or more embodiments of this application are set forth in the following drawings and description. Other features, objects, and advantages of this application will become apparent from the specification, drawings, and claims.
[0070] According to various embodiments of this application, a data recovery method, apparatus, computer device, readable storage medium, and program product are provided.
[0071] This application provides a data recovery method, the method comprising:
[0072] Download the standard format object that has been backed up to the cloud object storage system;
[0073] The standard format object is processed according to a preset data parsing method to obtain the recovered data; and
[0074] The recovered data is stored in a block storage system;
[0075] The standard format object includes at least one of the following: a data object, an index object, and a fence object.
[0076] This application also provides a data recovery apparatus, comprising:
[0077] The download module is used to download standard format objects that have been backed up to the cloud object storage system;
[0078] The recovery module is used to process the standard format object according to a preset data parsing method to obtain the recovered data; and
[0079] A storage module is used to store the recovery data in a block storage system;
[0080] The standard format object includes at least one of the following: a data object, an index object, and a fence object.
[0081] This application also provides a computer device, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to perform the following steps:
[0082] Download the standard format object that has been backed up to the cloud object storage system;
[0083] The standard format object is processed according to a preset data parsing method to obtain the recovered data; and
[0084] The recovered data is stored in a block storage system;
[0085] The standard format object includes at least one of the following: a data object, an index object, and a fence object.
[0086] This application also provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, performs the following steps:
[0087] Download the standard format object that has been backed up to the cloud object storage system;
[0088] The standard format object is processed according to a preset data parsing method to obtain the recovered data; and
[0089] The recovered data is stored in a block storage system;
[0090] The standard format object includes at least one of the following: a data object, an index object, and a fence object.
[0091] This application also provides a computer program product, including a computer program that, when executed by a processor, performs the following steps:
[0092] Download the standard format object that has been backed up to the cloud object storage system;
[0093] The standard format object is processed according to a preset data parsing method to obtain the recovered data; and
[0094] The recovered data is stored in a block storage system;
[0095] The standard format object includes at least one of the following: a data object, an index object, and a fence object.
[0096] Details of one or more embodiments of this application are set forth in the following drawings and description. Other features, objects, and advantages of this application will become apparent from the specification, drawings, and claims. Attached Figure Description
[0097] To more clearly illustrate the technical solutions in the embodiments of this application or the conventional technology, the drawings used in the description of the embodiments or the conventional technology will be briefly introduced below. Obviously, the drawings described below are only embodiments of this application. For those skilled in the art, other drawings can be obtained based on the disclosed drawings without creative effort.
[0098] Figure 1 is a diagram illustrating the application environment of object storage management methods in some embodiments;
[0099] Figure 2 is a flowchart illustrating the object storage management method in some embodiments;
[0100] Figure 3 is a flowchart illustrating the object storage management method in some other embodiments;
[0101] Figure 4 is a structural block diagram of an object storage management device in some embodiments;
[0102] Figure 5 is an internal structure diagram of the gateway device in some embodiments;
[0103] Figure 6 is a schematic diagram of the architecture for data backup and recovery in some related technologies;
[0104] Figure 7 is a schematic diagram of the architecture for implementing data backup and recovery in some embodiments;
[0105] Figure 8 is a schematic diagram of the technical architecture of the data backup / recovery system in some embodiments;
[0106] Figure 9 is a flowchart illustrating the data backup method in some embodiments;
[0107] Figure 10 is a schematic diagram of data object encapsulation in some embodiments;
[0108] Figure 11 is a schematic diagram of an index area;
[0109] Figure 12 is a flowchart illustrating the data recovery method in some embodiments;
[0110] Figure 13 is a structural block diagram of the data backup device in some embodiments;
[0111] Figure 14 is a structural block diagram of the data recovery device in some embodiments;
[0112] Figure 15 is an internal structure diagram of a computer device in some embodiments. Detailed Implementation
[0113] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.
[0114] It is understood that the storage gateway involved in this embodiment, namely the cloud storage gateway, serves the connection between local applications and remote cloud storage, transferring local data to the cloud. When object storage is used as a cloud service, the objects transferred to the cloud need to exist in multiple versions for the following two reasons:
[0115] 1. Ensure crash consistency. If objects do not exist in a multi-version format, uploading a new version of an object may overwrite the old version, potentially violating the principle of crash consistency. When a batch of data is written, an upload operation is performed on this data. Only when all data is successfully uploaded is the upload considered complete. If only half of the data is uploaded, it indicates that the upload was unsuccessful, and the previously successfully uploaded data will be considered the most valid data in the cloud. If a new object is directly overwritten with an old object, the principle of data crash consistency will not be met when the upload is incomplete. Therefore, while a new batch of data is not fully uploaded, both the old and new versions of the data object must be retained. Only when the new batch of data is fully uploaded can the old version object be deleted.
[0116] 2. Resolve the out-of-order problem of concurrent uploads. When uploading an object, if it gets blocked in a certain step of the network transmission and returns a failure to the upper layer after the timeout period, and then an updated version of the object is uploaded later, while the previously blocked request resumes transmission after a period of time, if the object does not exist in multiple versions, the old version will overwrite the new version, causing an out-of-order problem.
[0117] Therefore, when using object storage as a cloud service, objects transmitted to the cloud need to exist in multiple versions to ensure crash consistency and solve the out-of-order problem of concurrent uploads.
[0118] Currently, some object storage products have multi-version control (MRP) functionality, primarily using randomly generated and globally unique string representations of object version IDs. However, this doesn't allow directly comparing version IDs to determine an object's age. Furthermore, these strings are quite long, consuming significant space regardless of whether they are stored in memory or persistently. In addition to these limitations, some object storage products lack MRP functionality altogether, making product compatibility challenging.
[0119] To address the aforementioned issues, this application proposes an object storage management method, apparatus, gateway device, and storage system. Instead of utilizing the multi-version control functionality provided by the object storage product itself, it designs a scheme on the storage gateway to manage multiple versions of objects on the object storage system, enabling functions such as data management, cloud migration, cloud removal, garbage collection, and resume download. Using this method, the storage gateway can flexibly adapt to all object storage products, no longer limited by whether different object storage products support multi-version control, significantly improving compatibility with various object storage products.
[0120] It should be noted that this application embodiment mainly involves two types of storage objects: data objects and index objects. Data objects refer to the objects corresponding to data written into the storage gateway by the application layer and uploaded to the object storage system. Index objects refer to the objects corresponding to records of data object-related information uploaded to the object storage system. Both types of objects need to be stored in the object storage system using a multi-versioning approach. This application embodiment does not use the multi-versioning control function provided by the object storage itself, but instead implements a multi-versioning control scheme on the storage gateway based on the object storage management method provided in this application.
[0121] The object storage management method provided in this application embodiment can be applied to the application environment shown in Figure 1. The storage system includes a storage gateway 10 and an object storage system 20. Users submit data read / write requests through a client. The storage gateway 10 receives the data read / write requests, executes the object storage management method provided in this application, and uploads the data provided by the client to the object storage system 20 or reads data from the object storage system 20.
[0122] In an exemplary embodiment, referring to FIG1 and FIG2, an object storage management method is provided. Taking the application of this method to the storage gateway 10 in FIG1 as an example, the method includes:
[0123] Step 202: Receive the current data and write the current data to the corresponding block device; the storage gateway includes multiple block devices, and each block device divides the data objects according to a continuous and fixed-length offset range.
[0124] The current data refers to the data provided by the client that needs to be uploaded to the object storage system. The storage gateway in this embodiment is implemented based on block devices, and the currently written data is split according to continuous and fixed-length offsets within the block devices. For example, if the written data will be split into continuous 1M lengths within the block devices, the splitting results will be: [0, 1M), [1M, 2M), [2M, 3M), ... That is, data within each 1M continuous offset range will form a data object to be uploaded.
[0125] Optionally, the storage gateway receives the current data from the client, parses the current data, determines the address information to be written, and writes the current data into the offset range corresponding to the block device according to the address information.
[0126] Step 204: Set a unique storage name for the current data to form the data object to be transmitted.
[0127] In this embodiment, different names, i.e., unique storage names, are used for different data objects and different versions of the same data object. Traditional methods generally require the version number to be reflected in the object name. In contrast, this embodiment does not limit the specific naming method of data objects, as long as different data objects and different versions of the same data object are named with different names. Users can customize the naming rules according to their own needs.
[0128] The naming process of this embodiment is explained below with examples: Naming is done using two pieces of information: the starting offset of the block device and the version number: startOffset-version. If the written data is split into 1MB segments of the block device, the resulting data objects are named: 0-version1, 1048576-version2, 2097152-version3, and so on. When new data is written within a certain offset range, it is again split into 1MB segments of the block device and transmitted to the cloud again. If new data is written within the ranges [0, 1MB) and [2MB, 3MB), the data object names specified for transmission to the cloud are 0-version4 and 2097152-version5, respectively. 0-version1 and 0-version4 are different versions of the same data object, and 2097152-version3 and 2097152-version5 are also different versions of the same data object. Version 0-version4 is higher than 0-version1, and version 2097152-version5 is higher than 2097152-version3. It should be noted that this implementation does not have any requirements on the order of the numbers in version4 and version1, or version5 and version3. If the version number is represented by a long type number, it is not required that the number corresponding to version4 be greater than the number corresponding to version1. It is only necessary that version4 and version1 use different numbers. This is because the version is only used to implement different versions of the same data object with different object names. In actual applications, the version number can be incremented for business convenience, but it is not a mandatory requirement.
[0129] It is understandable that this embodiment does not utilize the multi-version control functionality of the object storage itself. Instead, it sets data object names on the storage gateway, allowing different data objects and different versions of the same data object to have different names. This has the following advantages: 1. The form of the version number can be controlled according to specific needs, for example, using long numbers or strings with certain patterns, facilitating multi-version control based on business scenarios or specific requirements. 2. If the multi-version control functionality of current object storage products is used, the long string length leads to wasted storage space; by using custom naming rules, the length of the unique storage name or the version information it carries can be controlled, thus saving storage space. 3. Since the multi-version control functionality provided by the object storage product itself is no longer used, it is more compatible with all object storage products currently on the market, even those that do not support multi-version control. This allows the storage gateway to flexibly adapt to all object storage products as a backend storage layer. 4. It should be noted that this embodiment has no requirements on the format of the data object version information; it can be in numeric, string, or any other format. Furthermore, there are no requirements on the order of the version numbers; they do not need to conform to ascending or descending order. It is sufficient that different data objects, or different versions of the same data object, are named with different names. 5. Further, this embodiment does not limit the unique storage name of the data object to carrying version information, as long as different versions of the same data object have different object names. For example, a data object may have two versions, with object names "abc" and "123" in the cloud. These two names are completely unrelated, but they can distinguish different versions of the same data object.
[0130] Step 206: If the preset upload conditions are met, upload the data objects to be uploaded from each device to the object storage system.
[0131] The storage gateway uploads data to the object storage system in batches. It's understood that the preset upload conditions are pre-defined rules. Optionally, the preset upload conditions correspond to a preset batch size; if the amount of data to be uploaded in each device of the storage gateway reaches the preset batch size, then the preset upload conditions are satisfied. Optionally, the preset upload conditions correspond to a preset upload time for each batch; if the cumulative upload time of the data to be uploaded reaches the preset batch upload time, then the preset upload conditions are satisfied.
[0132] Understandably, this involves uploading the current batch of data objects to the object storage system. The object storage system can be any existing object storage product, possessing the following functionalities: 1. The ability to read objects (GetObject); 2. The ability to store objects (PutObject); 3. The ability to view a list of objects (ListObjects); 4. The ability to delete objects (DeleteObject); 5. Strong consistency, meaning that once an object is put into the object storage, that object is visible to all subsequent Get and List requests.
[0133] Step 208: After the upload is completed, generate index data based on the block device address and unique storage name of each uploaded data object; the index data carries version information or time information.
[0134] After the data objects to be transmitted are successfully uploaded, it is necessary to record the relevant information of these transmitted data objects through an index, which may include the location information and version information of the transmitted data objects. In this embodiment, index data is generated based on the block device address and unique storage name of each transmitted data object.
[0135] Understandably, different versions of index objects need to be distinguished by different names or tags. Since the naming convention for data objects is to use different names to represent different data objects or different versions of the same data object, this embodiment requires index objects to identify the version. Therefore, when generating index objects, version information or time information is carried by naming or adding attributes. For example, the version of the index object can be indicated by reflecting ordered version information in the object name; another example is that the naming method is not limited, but version information is recorded in the meta tag of the index object; yet another example is that the naming method is not limited, but a timestamp is added to the attributes of the index object.
[0136] Among them, the index object is used to manage data objects on the object storage system, record the block device address and unique storage name of the data object in the cloud, and carry version information or time information, based on which the new and old of the same data object in the cloud can be identified.
[0137] In an alternative implementation, index objects also need to be stored in the cloud in multiple versions. Therefore, different versions of the same index object need to have different names, and the version information should be distinguishable. This multi-version storage of index objects in the cloud is achieved by incorporating version number information into the index object's name. Furthermore, the version numbers must be ordered; that is, the version number in the name of a newer version of the index object must be greater than the version number in the name of an older version. This allows the version number information in the index object's name to identify whether the index object is new or old.
[0138] In one optional implementation, the index records two parts: a location index area and a version index area. Referring to Table 1, which shows one type of index record content, this embodiment primarily uses the version index area to associate the unique storage name and version of a data object in the cloud. A set of `startOffset` and `name` in the version index area represents the block device address and unique storage name of a data object in the cloud. In this embodiment, the block device address uses the starting offset. As described above, this embodiment is based on a block device implementation. The written data is split into data objects according to continuous and fixed-length offsets. Therefore, using the starting offset of the data object as a marker to identify the data object allows for associating the data object with its name in the cloud.
[0139] Table 1:
[0140] The same data object may exist in different versions in the cloud, and it's also necessary to know which version the different names of the same data object represent in the cloud. Whenever a batch of new data is uploaded, the index is updated, and the version index area in the latest index records the latest valid version of the data object in the cloud.
[0141] Referring to Tables 2 and 3, Table 2 shows the contents of the version index area records of the old index; Table 3 shows the contents of the version index area records of the new index.
[0142] Table 2:
[0143] Table 3:
[0144] Based on the content recorded in the version index area of the index, we can see that data objects named name4 and name1 represent the new and old versions of the same data object, and data objects named name5 and name3 also represent the new and old versions of the same data object. Therefore, since the index object stores the block device address and unique storage name of each transmitted data object, and also carries version or time information, it is possible to associate the unique storage name and version of data objects in the cloud based on the index object, thereby realizing version management of data objects. In other words, by associating a data object with a unique storage name in the cloud through the index, and then identifying the version of the same data object with different names in the cloud based on the newness of the index, the indexing and management of data objects in the cloud can be achieved.
[0145] Optionally, older versions of data objects recorded in the old index can be cleaned up at an appropriate time to save storage space and improve storage resource utilization.
[0146] Step 210: Update the index file based on the index data.
[0147] The index file is a local file of the storage gateway. Through this index file, the storage gateway can manage information about data objects in the object storage system.
[0148] In the object storage management method described above, the storage gateway receives current data and writes it to the corresponding block device. The storage gateway includes multiple block devices, each of which divides data objects according to a continuous and fixed-length offset range. A unique storage name is assigned to the current data, forming the data object to be transferred. Under preset upload conditions, the data objects to be transferred from each block device are uploaded to the object storage system. After the upload is complete, index data is generated based on the block device address and unique storage name of each transferred data object. The index data carries version information or time information. The index file is updated based on the index data. Through this method, the storage gateway sets a unique storage name for each version of the data object. Based on the version information or time information of the index data, it realizes the upload processing and data management of multiple versions of objects. This is no longer limited by whether different object storage products support multi-version control functions, allowing the storage gateway to flexibly adapt to all object storage products and greatly improving compatibility with various object storage products.
[0149] In an exemplary embodiment, as shown in FIG3, the index file includes a location index area and a version index area. The location index area records the location information of each transmitted data object in the block device, and the version index area stores multiple index data. The method further includes:
[0150] Step 302: Receive data read request.
[0151] Step 304: Read data from the corresponding target block device according to the data read request.
[0152] Upon receiving a data read request from the client, the system first reads the data locally. Optionally, the system analyzes the data read request to determine the address information of the requested data, and then reads the data from the offset range corresponding to the target block device based on that address information.
[0153] Step 306: If no data is read locally, search the location index area based on the location information of the target block device to determine whether the target data object corresponding to the target block device is the transmitted data object.
[0154] Referring to Table 1 above, the location index area records multiple sets of offset and length values, indicating which locations on the block device have had data uploaded to the cloud. If the data to be read does not exist locally, it needs to be retrieved from the cloud. The location index area of the local index file determines whether the location to be read has had data uploaded.
[0155] Step 308: If the target data object is determined to be a transmitted data object, determine the unique storage name of the target data object from the version index area.
[0156] Step 310: Retrieve data from the object storage system based on the determined unique storage name.
[0157] If the location to be read has already been uploaded, the version index area is queried to obtain the unique storage name of the data object in the cloud, and the data object to be read can be downloaded from the cloud based on the unique storage name.
[0158] Step 312: Return the acquired data to the sender of the data read request.
[0159] In this embodiment, the unique storage name of a data object in the cloud can be determined through the location index area and the version index area, thereby enabling data to be moved from the cloud. First, data is read locally from the storage gateway. If the data is not found, it is determined whether the data to be read has already been uploaded. If it is determined that it has been uploaded to the object storage system, the relevant data is read from the object storage system, which improves data retrieval efficiency.
[0160] In an exemplary embodiment, after step 206, the method further includes: for each data object to be uploaded, if the current data object to be uploaded is completed, then recording the corresponding data object information in the progress status file.
[0161] The progress status file is used to record the upload progress. If a failure occurs during the upload of a batch of data, the progress status file is configured to resume the upload from its original progress after the failure is resolved. For example, referring to Table 4, which shows a data structure written to the progress status file, the progress status file uses an append-only file structure. Each time a data object is uploaded, a set of data structures as shown in Table 4 is recorded in the progress status file, indicating that the data object has been successfully uploaded in this batch of data. `startOffset` represents the starting offset of the data object, `name` represents the name of the object, and multiple sets of `luOffset+length` indicate that the uploaded data object contains data within the offset range of [luOffset, luOffset+length-1].
[0162] Table 4:
[0163] In this embodiment, a progress status file is provided to record the upload progress, which enables the function of resuming interrupted uploads.
[0164] In an exemplary embodiment, the method further includes: if a fault occurs, stopping the uploading of data objects to be uploaded in each device; if the fault is cleared, reading the progress status file to obtain data object information of the data objects that have been uploaded; and continuing to upload the data objects to be uploaded in each device based on the obtained data object information.
[0165] It is understandable that this embodiment uses batch data uploading, and the index object is not typically updated for each uploaded data object. To improve efficiency, the index object is updated only after each batch of data has been uploaded. A batch of data refers to data accumulated over a period of time or a certain amount of data. Therefore, if a batch of data is not fully uploaded and the index has not been updated, and a failure occurs at this time, the uploaded data objects in the current batch are not yet recorded in the index object and are considered invalid uploaded data objects. They need to be re-uploaded after the service is restored. In this embodiment, the upload progress is continuously recorded through a progress status file during the data object upload process. After the failure is resolved, the previous upload progress can be identified through the progress status file, and the previous upload progress can be resumed. This avoids re-uploading certain data objects, thereby improving efficiency. Compared to updating the index object after each data object is uploaded, the advantages of updating the progress status file after each data object is uploaded are: the progress status file is a local file, and writing to the local file is much faster than uploading the index object to the cloud; the progress status file is an append-only file structure, while the structure of the index object is relatively fixed and must record the complete status of all data objects. Updating the index object during the batch data upload process would significantly impact efficiency.
[0166] In an exemplary embodiment, after step 210, the method further includes: generating an index object based on the index data; and uploading the index object to an object storage system.
[0167] When updating the index, the local index file is updated first, and then constructed into an index object and uploaded to the cloud. The index object in the cloud also needs to be stored in multiple versions; the updated index is treated as the latest version and uploaded to the cloud. This means the latest index object records the latest data object, and older index objects contain older data objects. The local index file is mainly used to accelerate the response speed of read requests, because responding to a read request requires downloading the corresponding data object from the cloud based on the queried index record; reading the index file locally is faster than downloading the index object from the cloud. The index object in the cloud is mainly used for disaster recovery from the cloud when local data is corrupted.
[0168] In an exemplary embodiment, the method further includes: determining old version index data based on version information or time information carried by each index data in the index file; deleting the data object corresponding to the old version index data in the object storage system; and deleting the old version index object stored in the object storage system.
[0169] Specifically, for index data within the index, old versions of data objects and index objects are deleted periodically or at appropriate times based on pre-defined cleanup rules. Specifically, based on the version or time information carried by each index data item in the index file, the old and new versions of the index data are identified. If the data object corresponding to the current index data appears as a newer version in the new version of the index data, then the data object corresponding to the current index data is determined to be the old version data object. The data objects corresponding to the old version index data in the object storage system are deleted; the old version index objects stored in the object storage system are also deleted. For example, if an old version index object records information about data objects a0, b0, and c0, and then data object a1 is uploaded, the new version index object will record information about data objects a1, b0, and c0. The storage gateway can delete the a0 data object and the old version index object stored in the object storage system. In this way, timely cleanup of junk data objects in the cloud can be achieved.
[0170] In one alternative implementation, the object storage management method includes the following steps:
[0171] 1. The client writes data to the cloud storage gateway. Once the amount of data written reaches a certain value, the upload operation is scheduled to begin.
[0172] 2. Construct the data object to be uploaded and name the data object to be uploaded;
[0173] 3. Transfer data objects to the cloud using PutObject from object storage;
[0174] 4. Record the uploaded data objects in the progress status file;
[0175] 5. Once this batch of data has been transferred to the cloud, update the local index file;
[0176] 6. Transfer the index object to the cloud using PutObject from object storage;
[0177] 7. Next, the client continues to write data to the gateway. Once the amount of data written reaches a certain value, the upload operation is scheduled to begin.
[0178] 8. Construct the data objects to be uploaded and name them. If this batch of uploaded data objects contains the same data objects as those in step 2, different object names are needed to ensure that different versions of the same data object exist simultaneously in the cloud;
[0179] 9. Transfer data objects to the cloud using PutObject from object storage;
[0180] 10. Record the uploaded data objects in the progress status file;
[0181] 11. Once this batch of data has been transferred to the cloud, update the local index file;
[0182] 12. Upload the index object to the cloud using the PutObject method of object storage. If the uploaded index object contains the same index object as in step 6, the index object needs to be named using an incrementing version number.
[0183] 13. Based on the old index object, delete the old data object in the cloud at the appropriate time using the DeleteObject method of the object storage;
[0184] 14. Delete old index objects in the cloud.
[0185] In an exemplary embodiment, the method further includes: scanning each index object stored in the object storage system in the event of local data corruption; determining at least one latest index object based on the version information or time information carried by each index object; reading the unique storage name of the data object to be recovered from the at least one latest index object; obtaining the target data from the object storage system based on the read unique storage name; and performing local data recovery based on the obtained target data.
[0186] Specifically, when local data is corrupted or has issues, it can be recovered from the cloud. This involves scanning all index objects in the cloud, identifying newer versions of those objects, and retrieving and recovering the data based on the data objects recorded within them.
[0187] In one alternative implementation, when local data on the storage gateway is corrupted, recovery can be performed using data from the cloud. The specific process is as follows:
[0188] 1. Scan indexed objects in the cloud using ListObjects stored in the object store;
[0189] 2. Identify the old and new versions of the index object based on the version information in the index object name;
[0190] 3. Read the relevant information recorded in the new version index object to identify valid data objects on the cloud;
[0191] 4. Recover the data contained in the cloud data object using the GetObject method of object storage.
[0192] It should be understood that although the steps in the flowcharts of the embodiments described above are shown sequentially according to the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless explicitly stated herein, there is no strict order restriction on the execution of these steps, and they can be executed in other orders. Moreover, at least some steps in the flowcharts of the embodiments described above may include multiple steps or multiple stages. These steps or stages are not necessarily completed at the same time, but can be executed at different times. The execution order of these steps or stages is not necessarily sequential, but can be performed alternately or in turn with other steps or at least some of the steps or stages of other steps.
[0193] Based on the same inventive concept, this application also provides an object storage management apparatus for implementing the object storage management method described above. The solution provided by this apparatus is similar to the implementation described in the above method; therefore, the specific limitations in one or more object storage management apparatus embodiments provided below can be found in the limitations of the object storage management method described above, and will not be repeated here.
[0194] In an exemplary embodiment, referring to FIG1 and FIG4, an object storage management device is provided. Taking the application of this device to the storage gateway 20 in FIG1 as an example, it includes:
[0195] The receiving module 402 is used to receive the current data and write the current data into the corresponding block device; the storage gateway includes multiple block devices, and each block device divides data objects according to a continuous and fixed-length offset range.
[0196] The naming module 404 is used to set a unique storage name for the current data, forming the data object to be transmitted.
[0197] Upload module 406 is used to upload the data objects to be uploaded from each device to the object storage system when the preset upload conditions are met.
[0198] The index generation module 408 is used to generate index data based on the block device address and unique storage name of each uploaded data object after the upload is completed; the index data carries version information or time information; and the index file is updated based on the index data.
[0199] In the aforementioned object storage management device, the storage gateway sets a unique storage name for each version of data object. Based on the version information or time information of the index data, it realizes the upload processing and data management of multiple versions of objects. It is no longer limited by whether different object storage products support multi-version control functions, making the storage gateway flexibly adaptable to all object storage products and greatly improving its compatibility with various object storage products.
[0200] In an exemplary embodiment, the object storage management device further includes a reading module; the index file includes a location index area and a version index area, the location index area records the location information of each transmitted data object in the block device, and the version index area stores multiple index data; the receiving module 402 is further configured to receive a data reading request; the reading module is configured to read data from the corresponding target block device according to the data reading request; if no data is read locally, the module searches in the location index area according to the location information of the target block device to determine whether the target data object corresponding to the target block device is a transmitted data object; if the target data object is determined to be a transmitted data object, the module determines the unique storage name of the target data object from the version index area; the module retrieves data from the object storage system according to the determined unique storage name; and the module returns the retrieved data to the sender of the data reading request.
[0201] In one exemplary embodiment, the object storage management device further includes a progress management module; the progress management module is used to record the corresponding data object information in the progress status file for each data object to be uploaded if the current data object to be uploaded is completed.
[0202] In an exemplary embodiment, the progress management module is further configured to: stop uploading the data objects to be uploaded in each device if a fault occurs; and, if the fault is cleared, read the progress status file to obtain the data object information of the data objects that have been uploaded; and, based on the obtained data object information, continue uploading the data objects to be uploaded in each device.
[0203] In an exemplary embodiment, the index generation module 408 is further configured to generate an index object based on the index data and upload the index object to the object storage system.
[0204] In one exemplary embodiment, the object storage management device further includes a deletion module, which is used to determine old version index data based on version information or time information carried by each index data in the index file; delete the data objects corresponding to the old version index data in the object storage system; and delete the old version index objects stored in the object storage system.
[0205] In an exemplary embodiment, the object storage management device further includes a recovery module, which is configured to, in the event of local data corruption, scan each index object stored in the object storage system; determine at least one latest index object based on version information or time information carried by each index object; read the unique storage name of the data object to be recovered from the at least one latest index object; obtain target data from the object storage system based on the read unique storage name; and perform local data recovery based on the obtained target data.
[0206] Each module in the aforementioned object storage management device can be implemented entirely or partially through software, hardware, or a combination thereof. These modules can be embedded in the processor of the gateway device in hardware form or independent of it, or stored in the memory of the gateway device in software form, so that the processor can call and execute the operations corresponding to each module.
[0207] In an exemplary embodiment, a gateway device is provided, the internal structure of which can be shown in Figure 5. The gateway device includes a processor, memory, input / output interfaces (I / O), and a communication interface. The processor, memory, and I / O interfaces are connected via a system bus, and the communication interface is connected to the system bus via the I / O interfaces. The processor of the gateway device provides computing and control capabilities. The memory of the gateway device includes non-volatile storage media and internal memory. The non-volatile storage media stores the operating system and computer programs. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The I / O interfaces of the gateway device are used for exchanging information between the processor and external devices. The communication interface of the gateway device is used for communication with external terminals via a network connection. When the computer program is executed by the processor, it implements an object storage management method.
[0208] Those skilled in the art will understand that the structure shown in Figure 5 is merely a block diagram of a portion of the structure related to the present application and does not constitute a limitation on the gateway device to which the present application is applied. A specific gateway device may include more or fewer components than those shown in the figure, or combine certain components, or have different component arrangements.
[0209] In one exemplary embodiment, a gateway device is provided, including a memory and a processor. The memory stores a computer program, and the processor executes the computer program to perform the following steps: receiving current data and writing the current data into a corresponding block device; the storage gateway includes multiple block devices, each block device dividing data objects according to a continuous and fixed-length offset range; setting a unique storage name for the current data to form a data object to be transmitted; uploading the data objects to be transmitted in each block device to an object storage system when preset upload conditions are met; after the upload is completed, generating index data based on the block device address and unique storage name of each transmitted data object; the index data carries version information or time information; and updating the index file based on the index data.
[0210] In one embodiment, when the processor executes the computer program, it further performs the following steps: receiving a data read request; reading data from the corresponding target block device according to the data read request; if no data is read locally, searching in the location index area according to the location information of the target block device to determine whether the target data object corresponding to the target block device is a transmitted data object; if the target data object is determined to be a transmitted data object, determining the unique storage name of the target data object from the version index area; retrieving data from the object storage system according to the determined unique storage name; and returning the retrieved data to the sender of the data read request.
[0211] In one embodiment, when the processor executes the computer program, it also performs the following steps: for each data object to be transmitted, if the current data object to be transmitted has been uploaded, the corresponding data object information is recorded in the progress status file.
[0212] In one embodiment, when the processor executes the computer program, it further performs the following steps: if a fault occurs, it stops uploading the data objects to be uploaded in each device; if the fault is cleared, it reads the progress status file to obtain the data object information of the data objects that have been uploaded; and based on the obtained data object information, it continues to upload the data objects to be uploaded in each device.
[0213] In one embodiment, when the processor executes the computer program, it also performs the following steps: uploading the index object to the object storage system based on the index data.
[0214] In one embodiment, when the processor executes the computer program, it further performs the following steps: determining the old version index data based on the version information or time information carried by each index data in the index file; deleting the data object corresponding to the old version index data in the object storage system; and deleting the old version index object stored in the object storage system.
[0215] In one embodiment, when the processor executes the computer program, it further performs the following steps: in the event of local data corruption, scanning each index object stored in the object storage system; determining at least one latest index object based on the version information or time information carried by each index object; reading the unique storage name of the data object to be recovered from the at least one latest index object; obtaining the target data from the object storage system based on the read unique storage name; and performing local data recovery based on the obtained target data.
[0216] In one embodiment, as shown in Figure 1, a storage system is provided, comprising a storage gateway 10 and an object storage system 20; wherein: the storage gateway 10 is used to perform the following steps: receiving current data and writing the current data into the corresponding block device; the storage gateway 10 includes multiple block devices, each block device dividing data objects according to a continuous and fixed-length offset range; setting a unique storage name for the current data to form a data object to be transmitted; uploading the data objects to be transmitted in each block device to the object storage system 20 when preset upload conditions are met; after the upload is completed, generating index data based on the block device address and unique storage name of each transmitted data object; the index data carries version information or time information; updating the index file based on the index data.
[0217] In one embodiment, the index file includes a location index area and a version index area. The location index area records the location information of each transmitted data object in the block device, and the version index area stores multiple index data. The storage gateway 10 is used to perform the following steps: receiving a data read request; reading data from the corresponding target block device according to the data read request; if no data is read locally, searching in the location index area according to the location information of the target block device to determine whether the target data object corresponding to the target block device is a transmitted data object; if the target data object is determined to be a transmitted data object, determining the unique storage name of the target data object from the version index area; retrieving data from the object storage system 20 according to the determined unique storage name; and returning the retrieved data to the sender of the data read request.
[0218] In one embodiment, the storage gateway 10 is used to perform the following steps: for each data object to be transmitted, if the current data object to be transmitted has been uploaded, the corresponding data object information is recorded in the progress status file.
[0219] In one embodiment, the storage gateway 10 is used to perform the following steps: if a fault occurs, stop uploading the data objects to be uploaded in each block device; if it is determined that the fault has been cleared, read the progress status file to obtain the data object information of the data objects that have been uploaded; and continue to upload the data objects to be uploaded in each block device according to the obtained data object information.
[0220] In one embodiment, the storage gateway 10 is used to perform the following steps: generating an index object based on index data; and uploading the index object to the object storage system 20.
[0221] In one embodiment, the storage gateway 10 is used to perform the following steps: determine the old version index data based on the version information or time information carried by each index data in the index file; delete the data object corresponding to the old version index data in the object storage system 20; and delete the old version index object stored in the object storage system 20.
[0222] In one embodiment, the storage gateway 10 is used to perform the following steps: in the event of local data corruption, scanning each index object stored in the object storage system 20; determining at least one latest index object based on the version information or time information carried by each index object; reading the unique storage name of the data object to be recovered from the at least one latest index object; obtaining the target data from the object storage system 20 based on the read unique storage name; and performing local data recovery based on the obtained target data.
[0223] It should be noted that the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data used for analysis, data stored, data displayed, etc.) involved in this application are all information and data authorized by the user or fully authorized by all parties, and the collection, use and processing of the relevant data must comply with relevant regulations.
[0224] This application also relates to the field of cloud data backup / recovery technology, and in particular to a data backup / recovery method, apparatus, computer equipment, computer-readable storage medium, and computer program product.
[0225] Currently, in the field of cloud data backup and recovery technology, in scenarios where backup or recovery is performed locally and in the cloud, there are often limitations in selecting or changing backup / data recovery product suppliers or cloud service providers due to differences in data organization methods and backup or recovery protocols.
[0226] There is an urgent need for a universal and portable data backup / recovery method to facilitate flexible selection of backup product vendors and cloud service providers.
[0227] It should be noted that the terms "first," "second," etc., used in this application can be used to describe different objects, but these objects are not limited by these terms. These terms are only used to distinguish two different objects; for example, "first verification field" and "second verification field" are used to distinguish two verification fields. The terms "comprising" and "having," and any variations thereof, used in this application, are intended to cover non-exclusive inclusion. The term "multiple" used in this application refers to two or more. The term "and / or" used in this application refers to one of the schemes, or any combination of multiple schemes.
[0228] The following describes the technical terms used in the embodiments of this application.
[0229] (1) Volume, also referred to as logical volume in this embodiment. A logical volume is a logical partition of physical space in a block storage system, providing raw device access for data to a virtual machine (VM) or physical machine. It can be created, deleted, and expanded.
[0230] (2) Volume Identifier (ID). The volume identifier is the unique identifier of a logical volume in the block storage system.
[0231] (3) Block, where a block represents a collection of data information stored in a computer. The data seen from the user's perspective exists in the form of blocks.
[0232] (4) Object-Based Cloud Storage, where the above-mentioned object storage refers to cloud storage that uses objects as storage units and provides object-level access interfaces.
[0233] (5) Bucket, also referred to as storage bucket in this embodiment. A bucket is a container for managing data in a cloud object storage system. One volume corresponds to one bucket, which is used to store block data within a volume.
[0234] (6) Object: The aforementioned object refers to a data unit that records user data. An object consists of an object name, an object identifier, metadata, and user data. The object identifier uniquely identifies the object. An object is the basic unit for storing data in a cloud-based object storage system; block data in a volume is stored as objects in buckets within the object storage system.
[0235] (7) Entry Tag (ETag), where ETag is a unique identifier generated for each object. For example, it can be the MD5 hexadecimal hash value of the object content, used to verify the integrity of the object content.
[0236] (8) Version, where Version is used to identify the state of data, files or systems, and each version has a unique version number.
[0237] (9) Version number (version_num) Wherein, the version number is used to uniquely identify different versions.
[0238] For example, the version number can be a strictly incremental number at the volume level. In practice, it can also take other forms, as long as they can uniquely identify different versions.
[0239] For example, the default version number size is 8 bytes.
[0240] (10) Barrier, where a barrier is used to indicate that data written to a volume will be backed up to a version number of object storage with a selected backup type before the backup time.
[0241] (11) Data Address Segment: Divides the volume into several consecutive and mutually exclusive logical address segments of a fixed length.
[0242] (12) Index Address Segment: A fixed number of contiguous data address segments within the volume are combined into an index address segment. The index address segment and the data address segment have a one-to-many mapping relationship. One index address segment covers a fixed number of contiguous data address segments.
[0243] (13) Full Backup: Full backup refers to the process of backing up all the data of a specified data object.
[0244] (14) Incremental Backup: Incremental backup refers to the process of backing up only the data objects that have been modified since the last backup.
[0245] (15) Differential Backup, where differential backup refers to the process of backing up data objects that have been modified since the last full backup.
[0246] Figure 6 is a schematic diagram of an architecture for data backup and recovery in a related technology. Currently, in scenarios where data is backed up / recovered locally and in the cloud, the different data organization formats and backup / recovery protocols often lead to limitations in selecting or changing backup product vendors or cloud service providers. In other words, as shown in Figure 6, the related technologies include various data backup / recovery systems with different data organization formats and backup / recovery protocols. These data backup / recovery systems differ and lack universality and portability. The architecture shown in Figure 6 uses data backup / recovery system A and data backup / recovery system B as an example. Data backup / recovery system A and data backup / recovery system B can back up locally stored content to cloud storage (i.e., cloud storage 11, cloud storage 12, and cloud storage 13 in Figure 6) using different data organization formats and different backup / recovery protocols.
[0247] In this embodiment, by specifying common technologies for data backup / recovery systems, including consistent data organization formats and standardized backup / recovery protocols, the universality and portability of data backup / recovery in scenarios ranging from local block storage to cloud object storage can be improved. This facilitates users' flexible selection and switching of backup / recovery product vendors and cloud service providers. Figure 7 shows a schematic diagram of the architecture for implementing data backup and recovery in some embodiments. As shown in Figure 7, a unified data backup / recovery system, i.e., a unified data organization format and a unified backup / recovery protocol, can be used to back up the content of local storage to cloud storage (i.e., cloud storage 11, cloud storage 12, and cloud storage 13 in Figure 7).
[0248] In this embodiment, local storage can refer to a block storage system, and cloud storage can refer to a cloud-based object storage system. Figure 8 is a schematic diagram of the technical architecture of the data backup / recovery system in some embodiments. As shown in Figure 8, in a data backup scenario, the data backup / recovery system can uniformly organize the data to be backed up in the block storage system and store it in the cloud-based object storage system through a backup protocol; in a recovery scenario, the data backup / recovery system can restore the data from the cloud-based object storage system to the block storage system through a recovery protocol.
[0249] As shown in Figure 8, the core elements of a data backup / recovery system include:
[0250] a) Data Organization: This includes the organization logic and format of cloud-based data. The organization logic of cloud-based data is based on data partitioning, index partitioning, and fence partitioning. The data organization format includes data objects, index objects, and fence objects, which organize the block data of the volume in the block storage system into different types of objects and store them in different partitions of the bucket in the cloud object storage.
[0251] b) Data backup protocol: The protocol process for organizing block data of a volume in local block storage into objects in a standard format and uploading them to cloud object storage.
[0252] c) Data Recovery Protocol: The protocol process for discovering content to be recovered from the buckets of the cloud object storage system and downloading, parsing, and restoring the data to the block storage system.
[0253] This application provides data backup and data recovery methods. These methods offer versatility and portability, facilitating flexible selection of backup / recovery product vendors and cloud service providers.
[0254] The data backup method provided in this application can be implemented by a data backup device or a computer device. The data backup device can be a functional module or entity within the computer device used to implement the data backup method. In some embodiments, the computer device can be a local device or a cloud device. For example, the computer device can be a local computer, tablet computer, etc., or a server on a cloud device.
[0255] In this embodiment of the application, the backup process can be implemented through a preset backup protocol. The backup process will be described in detail in the following embodiments under different circumstances.
[0256] In one exemplary embodiment, Figure 9 is a flowchart illustrating a data backup method in some embodiments; the method includes the following steps:
[0257] 401. Process the data to be backed up in the block storage system according to the preset data organization method to obtain standard format objects.
[0258] The standard format objects include at least one of the following: data objects, index objects, and fence objects.
[0259] In this embodiment, to back up data to be backed up from a block storage system to a cloud object storage system, the data needs to be organized according to a preset data organization method: each logical volume must be mapped to an independent bucket in the cloud object storage system, and the data organization method can correspond to the logical partitions and objects within the bucket. Within the bucket, logical partitions can be divided, including at least one of data partitions, index partitions, and fence partitions. These partitions are used to identify data objects, index objects, and fence objects, respectively. In other words, a logical partition can include one of the following: data partitions, index partitions, and fence partitions; or, a logical partition can include two or more of these partitions.
[0260] In some embodiments, the block storage system includes at least one logical volume, the logical volume includes at least one index address segment, the index address segment includes at least one data address segment, the data address segment includes at least one block data address segment, and the block data address segment is used to store block data.
[0261] In some embodiments, the cloud object storage system includes at least one bucket, each bucket corresponding to a different logical volume, and the logical partitions in the bucket include at least one of data partitions, index partitions, and fence partitions.
[0262] In some embodiments, data objects within the same bucket are identified by the data partition name of the data partition within that bucket.
[0263] In some embodiments, index objects in the same bucket are identified by the index partition name of the indexed partition therein.
[0264] In some embodiments, fence objects in the same bucket are identified by the fence partition name of the fence partition therein.
[0265] In some embodiments, different logical volumes may contain the same number of index address ranges, or the number of index address ranges contained in different logical volumes may be different.
[0266] In some embodiments, different index address segments may include the same number of data address segments, or different index address segments may include different numbers of data address segments.
[0267] In some embodiments, different data address segments may contain the same number of block data address segments, or different data address segments may contain different numbers of block data address segments.
[0268] In some embodiments, different block data address segments may have the same length, or they may have different lengths.
[0269] In some embodiments, the naming format of the data partition includes: a prefix, the volume identifier corresponding to the storage bucket, and the data partition identifier.
[0270] For example, the naming format for data partitions is shown in Table 1 below:
[0271] Table 1
[0272] In some embodiments, the naming format of the index partition includes: a prefix, the volume identifier corresponding to the bucket, and the index partition identifier.
[0273] For example, the naming format for index partitions is shown in Table 2 below:
[0274] Table 2
[0275] In some embodiments, the naming format of the fence partition includes: a prefix, the volume identifier corresponding to the bucket, and the index partition identifier.
[0276] For example, the naming format for fence zones is shown in Table 3 below:
[0277] Table 3
[0278] In some embodiments, the data object is obtained by encapsulating data in a data address segment within a logical volume in address order, and the content format of the data object includes at least one of a complete data object, an incremental data object, and a differential data object.
[0279] For complete data objects: In some embodiments, during a full backup, all data written to a data address range within the logical volume before the backup time is encapsulated into a single object. In some alternative embodiments, if no new data has been written before the backup time, no object is generated.
[0280] For incremental data objects: In some embodiments, during incremental backup, data newly written to a data address range within the logical volume before the backup time and since the last backup is encapsulated into an independent object. In some optional embodiments, if no new data has been written before the backup time, no object is generated.
[0281] For differential data objects: In some embodiments, during differential backup, data newly written to a data address range within the logical volume before the backup time and after the most recent full backup is encapsulated into a separate object. In some optional embodiments, if no new data has been written before the backup time, no object is generated.
[0282] For example, Figure 10 is a schematic diagram of data object encapsulation in some embodiments. As shown in Figure 10, the data object is encapsulated in address order by data that meets the backup requirements in a data address segment. Blocks a, b, c, i, and j in Figure 10 represent block data in the data address segment, respectively.
[0283] In some embodiments, the naming format of a data object includes: the data partition name, the starting address of the data address segment corresponding to the data object in the logical volume, and the version number of the data object.
[0284] For example, the naming rules for data objects include:
[0285] a) Data object names are unique within the volume.
[0286] b) The data object name should include the data partition name, the volume address information corresponding to the data object, and the version number.
[0287] For example, the naming format for data objects is shown in Table 4 below:
[0288] Table 4
[0289] In some embodiments, each index object stores an index of backup data within an index address range within a logical volume, with a one-to-one mapping between index objects and index address ranges. The relationship between index objects and data objects is one-to-many, meaning that one index object stores indexes of data from multiple data objects.
[0290] In some embodiments, the naming format of an index object includes: the index partition name, the starting address of the index address segment corresponding to the index object in the logical volume, and the version number of the index object.
[0291] For example, the naming rules for data objects include:
[0292] a) The index object name is unique within the volume.
[0293] b) The index object name should include the index partition name, the volume address information corresponding to the index object, and the version number.
[0294] For example, the naming format of index objects is shown in Table 5 below:
[0295] Table 5
[0296] In some embodiments, the content format of an index object includes a header metadata area and an index area. For example, the content format of an index object is shown in Table 6 below:
[0297] Table 6
[0298] The header metadata area includes the metadata of the indexed objects, and the index area includes the position index of the data in each data object in the index address segment. The position index of each data object is stored in the index area in address order.
[0299] In some embodiments, the header metadata area includes at least one of the following fields:
[0300] The file information field is used to identify at least one of the purpose and owner of the indexed object;
[0301] The software version field is used to identify the backup software version.
[0302] The start address field is used to identify the starting address of the index address segment corresponding to the index object in the logical volume.
[0303] The address length field is used to identify the address length of the index address segment corresponding to the index object in the logical volume.
[0304] The data volume field is used to identify the amount of valid data written into the target index address range.
[0305] The index is a number of numeric fields used to identify the number of data segments indexed in the indexed object.
[0306] The data object name length field, and the index number field, are used to identify the length of the data object name in the index area;
[0307] First validation field;
[0308] Reserved fields.
[0309] For example, the header metadata area is located at the first 4 kilobytes (KiB) of the index object header. The format of the header metadata area can be shown in Table 7 below:
[0310] Table 7
[0311] In some embodiments, the location index of data in the index address segment within the above-mentioned index area in each data object includes at least one of the following fields:
[0312] Data object name field;
[0313] The starting address field is used to identify the starting address of the data within the data object.
[0314] Data length field;
[0315] Fence partition identifier field;
[0316] The second verification field.
[0317] For example, before the backup time, the location information of the data written within the index address range in each data object is used as an index and stored in the index area of the index object in address order. Figure 11 is a schematic diagram of an index area, and the CRC in Figure 11 identifies the check field of Cyclic Redundancy Check.
[0318] For example, the specific form of the location index of data in each data object within the index address segment of the above index area is shown in Table 8 below:
[0319] Table 8
[0320] In some embodiments, a fence object indicates that a backup of the volume has been completed completely and consistently.
[0321] The content format of the fence object is empty;
[0322] The naming format for fence objects includes: fence partition name and version number of this backup.
[0323] For example, the naming format for fence objects is shown in Table 9 below:
[0324] Table 9
[0325] Case 1: Standard format objects include: data objects.
[0326] In some embodiments, the process of processing the data to be backed up in the block storage system according to a preset data organization method to obtain a standard format object may include, but is not limited to, the following steps:
[0327] 401a. Determine the data to be backed up in the target index address range of the target logical volume in the block storage system.
[0328] The target logical volume can be any logical volume in the block storage system, and the target index address range can be any index address range in the target logical volume.
[0329] For example, during the data preparation phase, the logical volumes that need to be backed up can be determined first. The data in the logical volume can be grouped according to the various data address segments divided in the logical volume. Subsequently, a data object can be generated for each data address segment, and a volume-level incremental version number can be generated for each data object. The data object name can be generated according to the naming format of the data object.
[0330] 401b. Generate data objects for each data address range in the data to be backed up, according to the content format of the data objects.
[0331] After generating the data objects, a volume-level incremental version number can be generated for each data object.
[0332] 401c. Generate a data object name for each data object according to the naming format of the data object.
[0333] In some embodiments, after generating the data object, a compression algorithm and an encryption algorithm can be selected to first compress the data object, and then encrypt the compressed data object. The compressed and encrypted data object is then used as the data object to be uploaded.
[0334] Case 2: Standard format objects also include index objects, that is, they include both data objects and index objects.
[0335] The process of processing the target data to be backed up in the block storage system according to the preset data organization method to obtain a standard format object may also include the following steps for index objects:
[0336] 401d. If all data address segments within the target index address segment have generated data objects and have been backed up to the cloud object storage system, generate an index object for the target index address segment based on the content format of the index object.
[0337] The prerequisites for generating an index object include: when all data address segments within a certain index address segment have generated data objects and have been backed up to object storage, then the index object corresponding to that index address segment can be backed up.
[0338] Specifically, based on the content format requirements of the index object mentioned above, an index object can be generated for the index address range that meets the above prerequisites, and a volume-level incrementing version number can be generated for the index object.
[0339] 401e. Generate the index object name according to the index object naming format.
[0340] After generating the index object, you can select a compression algorithm and an encryption algorithm to compress the index object first, and then encrypt the compressed index object. Use the compressed and encrypted index objects as the index objects to be uploaded.
[0341] Case 3: Standard format objects also include fence objects, that is, data objects, index objects and fence objects.
[0342] In some embodiments, the process of processing the data to be backed up in the block storage system according to a preset data organization method to obtain a standard format object may include, but is not limited to, the following steps:
[0343] 401f: If all indexed address ranges in the target logical volume have been indexed and backed up to a cloud object storage system, generate a fence object according to the content format requirements of the fence object.
[0344] The fence object is used to indicate that a backup of the target logical volume has been completed.
[0345] In some embodiments, when all index address ranges in a logical volume have generated index objects and have been backed up to a cloud object storage system, a fence object representing the backup status of the logical volume can be backed up.
[0346] 401g. Generate fence object names according to the naming format of fence objects.
[0347] First, a volume-level incrementing version number can be generated as a fence. This version number should be greater than the version numbers corresponding to all index objects in the volume. Second, a fence object can be generated based on the content format of the fence object.
[0348] 402. Back up standard format objects to a cloud object storage system.
[0349] Backing up standard format objects to a cloud object storage system may include, but is not limited to, transferring standard format objects to a cloud object storage system based on a preset backup protocol.
[0350] Regarding situation 1 above:
[0351] The aforementioned process of transferring standard format objects to a cloud object storage system based on a preset backup protocol may include:
[0352] a) Construct a PUT request. The PUT request header must include the Bucket corresponding to the logical volume, the name of the data object, and the compression and encryption algorithms used via custom metadata (meta). PUT is a method in the Hypertext Transfer Protocol used to write or update data to a specified resource location. The sender uses this method to upload the complete data content to the target address. If the resource already exists at that address, the existing data is overwritten; otherwise, a new resource is created. It is commonly used for complete data updates or uploads.
[0353] b) Upload the data object to be uploaded to the Bucket corresponding to the logical volume in the cloud object storage system through the PUT interface of the cloud object storage system.
[0354] In some embodiments, the above-mentioned transmission of standard format objects to the cloud object storage system based on a preset backup protocol may include, but is not limited to: transmitting a first PUT request through the PUT interface of the cloud object storage system to upload the generated data object to the target storage bucket corresponding to the target logical volume.
[0355] The request body of the first PUT request includes a data object, and the request header of the first PUT request includes first indication information, which indicates at least one of the following: target storage bucket, data object name, compression algorithm, and encryption algorithm.
[0356] Regarding situation 2 above:
[0357] The aforementioned method of transferring standard format objects to a cloud object storage system based on a preset backup protocol may also include:
[0358] a) Construct a PUT request. The request header must include the Bucket corresponding to the logical volume, the index object name, and the compression and encryption algorithms to be used via a custom meta tag.
[0359] b) Upload the index object to be uploaded to the Bucket corresponding to the logical volume in the cloud object storage system through the PUT interface of the cloud object storage system.
[0360] In some embodiments, the above-mentioned transmission of standard format objects to the cloud object storage system based on a preset backup protocol may include, but is not limited to: transmitting a second PUT request through the PUT interface of the cloud object storage system to upload at least one generated index object to the target storage bucket corresponding to the target logical volume.
[0361] The request body of the second PUT request includes the generated index object, and the request header of the second PUT request includes second indication information, which is used to indicate at least one of the following: target storage bucket, index object name, compression algorithm, and encryption algorithm.
[0362] Regarding situation 3 above:
[0363] The aforementioned method of transferring standard format objects to a cloud object storage system based on a preset backup protocol may also include:
[0364] a) Construct a PUT request. The PUT request header must include the Bucket corresponding to the logical volume and the fence object name.
[0365] b) Upload the fence object to the Bucket corresponding to the logical volume in the cloud object storage system via the PUT interface of the cloud object storage system.
[0366] In some embodiments, the above-mentioned transmission of standard format objects to the cloud object storage system based on a preset backup protocol may include, but is not limited to: transmitting a third PUT request through the PUT interface of the cloud object storage system to upload the generated fence object to the target storage bucket corresponding to the target logical volume.
[0367] The request header of the third PUT request includes third indication information, which indicates at least one of the following: target bucket, fence object name.
[0368] In some embodiments, after a backup of the target logical volume has been completed, the method further includes: deleting objects to be cleaned up through the DELETE interface of the cloud object storage system, the objects to be cleaned up including at least one of data objects, index objects, and fence objects.
[0369] DELETE is a request method in the Hypertext Transfer Protocol used to delete a specified resource. After sending a deletion request to the target address using this method, the server will remove the resource with the corresponding identifier. It is commonly used for cleanup operations of resources such as data, files, or service nodes.
[0370] In some embodiments, data objects, index objects, and fence objects can be cleaned up in batches using the DELETE interface described above.
[0371] During a full backup or differential backup, eligible objects in the cloud object storage can be cleaned up.
[0372] The methods for determining the objects to be cleaned include at least one of the following:
[0373] 1) For each data address range, if the full backup completed this time newly generates and backs up data objects, then the data objects previously backed up to the cloud object storage system are data objects to be cleaned up.
[0374] 2) For each data address range, if the differential backup completed this time generates and backs up new data objects, then the data objects backed up to the cloud object storage system after the last full backup and before this differential backup are the data objects to be cleaned up.
[0375] 3) For each index data segment, if the full backup completed this time newly generated and backed up the index object, then the index object previously backed up to the cloud object storage system is the index object to be cleaned up.
[0376] 4) For each index data segment, if the differential backup completed this time generates and backs up a new index object, then the index object previously backed up to the cloud object storage system is the index object to be cleaned up.
[0377] 5) For each logical volume, the previously backed-up fence objects are the fence objects to be cleaned up.
[0378] It should be noted that the various processes in the backup process can be implemented based on a preset backup protocol in this embodiment. For example, the generation, compression, encryption, and uploading processes of data objects, index objects, and fence objects mentioned above can be implemented based on a preset backup protocol.
[0379] The data backup method described above implements the data backup process based on a preset communication protocol and a preset data organization method, which unifies the communication protocol and data organization method during the data backup process. This improves the universality and portability of the data backup method, thereby facilitating the flexible selection of backup product vendors and cloud service providers.
[0380] The data recovery method provided in this application can be implemented by a data recovery device or a computer device. The data recovery device can be a functional module or entity within the computer device used to implement the data recovery method. In some embodiments, the computer device can be a local device or a cloud device. For example, the computer device can be a local computer, tablet computer, etc., or a server on a cloud device.
[0381] It should be noted that the concepts involved in the above data recovery method that are the same as those in the above data backup process (such as data objects, index objects, and fence objects) can be understood in a similar way, and will not be elaborated here.
[0382] In this embodiment of the application, the recovery process can be implemented through a preset recovery protocol. The recovery process will be described in detail in the following embodiments under different circumstances.
[0383] In one exemplary embodiment, Figure 12 is a flowchart illustrating a data recovery method in some embodiments; the method includes the following steps:
[0384] 701. Download the standard format object that has been backed up to the cloud object storage system.
[0385] The standard format objects include at least one of the following: data objects, index objects, and fence objects.
[0386] In some embodiments, standard format objects that have been backed up to a cloud object storage system can be downloaded based on a preset recovery protocol.
[0387] 702. Process the standard format object according to the preset data parsing method to obtain the recovered data.
[0388] The preset recovery protocol can support the recovery process for some or all of the data in a logical volume.
[0389] In this embodiment of the application, during the process of restoring some data of a logical volume that has been backed up to a cloud object storage system to a block storage system, the block storage system can still provide normal services.
[0390] In this embodiment, full recovery at the logical volume level (i.e., recovery of all data in the logical volume) is supported. Specifically, it supports the complete restoration of the latest version of backup data of the logical volume from a cloud object storage system to the block storage system at a certain point in time. The block storage system can be the original block storage system used for backup or another third-party local block storage system. After recovery, the block storage system can continue to provide read / write and data backup for the logical volume.
[0391] In this embodiment of the application, a cloud direct recovery mode can be supported during recovery, which allows data to be read directly from the cloud object storage system on demand without downloading the complete volume data to the local machine, so as to support rapid business startup and data access.
[0392] In some embodiments, the data to be recovered includes: a portion of the data in the target logical volume, and all the data in the target logical volume.
[0393] Case a1: The data to be recovered includes a portion of the data in the target logical volume, and the standard format objects include index objects.
[0394] In some embodiments, the data index of the data to be recovered can be obtained. If the data index of the data to be recovered is not available in the block storage system, it can be obtained from the cloud object storage system.
[0395] In some embodiments, the process of downloading standard format objects that have been backed up to a cloud object storage system based on a preset recovery protocol may include, but is not limited to, the following steps:
[0396] 701a. Determine the target index address range where the data to be recovered is located based on the starting address and address length of the data to be recovered in the target logical volume.
[0397] Specifically, the index address segment where the data to be recovered is located can be calculated based on the volume start address (lv_offset) and length (length) of the data to be recovered.
[0398] 701b Download the list of index object names corresponding to the target index address range.
[0399] In some embodiments, the process of downloading the list of index object names corresponding to the target index address range may include, but is not limited to: transmitting a first LIST request through the list (LIST) interface of the cloud object storage system to download the list of index object names corresponding to the target index address range.
[0400] LIST is a request method in the Hypertext Transfer Protocol used to query associated resource information under a specified resource address. This method can obtain a list of resource names, identifiers, etc., corresponding to the target address. It is commonly used for resource enumeration queries in scenarios such as cloud storage and file services.
[0401] The request header of the first LIST request includes first indication information, which indicates at least one of the following: the target bucket corresponding to the target logical volume, the index partition name as the prefix of the index object, the starting position of the LIST, and the maximum number of objects that can be returned in one LIST request.
[0402] For example, a LIST request can be constructed, in which the request header must include specifying the Bucket corresponding to the volume, specifying the index partition name as the object prefix, specifying the starting position marker of the LIST, and specifying the maximum number of objects to be returned in one LIST (max-keys); then, the LIST interface of the object storage can be used to download the list of all index object names corresponding to the index address range from the Bucket.
[0403] 701c. Parse the index object names in the index object name list according to the index object naming format to determine the version number in the index object name.
[0404] In this process, the version number in the index object name can be parsed out one by one according to the naming rules of the index objects mentioned above.
[0405] 701d. Download the corresponding index object based on the version number in the index object name.
[0406] The process of downloading the corresponding index object based on the version number in the index object name may include, but is not limited to: downloading the index object corresponding to the index object name from the target storage bucket through the GET interface of the cloud object storage system.
[0407] The above-mentioned processing of standard format objects according to the preset data parsing method to obtain recovered data may include, but is not limited to: parsing the downloaded index object according to the content format of the index object to obtain the data index in the index object, wherein the data index includes: a list of data object names.
[0408] For example, the index object name corresponding to the largest version number is the latest index object name. The latest index object name is specified and the corresponding index object is downloaded via the object storage's GET interface.
[0409] In some embodiments, after downloading the corresponding index object via the GET interface, the corresponding GET response header can be parsed, and the encryption and compression algorithms can be obtained through custom metadata. The downloaded object is first decrypted, and then the decrypted index object is decompressed to obtain the original index object.
[0410] In some embodiments, processing a standard format object according to a preset data parsing method to obtain recovered data may include, but is not limited to: parsing the downloaded index object according to the content format of the index object to obtain the data index in the index object, wherein the data index includes: a list of data object names.
[0411] Specifically, the index object can be parsed according to its content format, and the corresponding data index can be obtained based on the starting address and length of the logical volume of the data to be recovered. For example, the data index can take the following form:<data_object_name,object_offset,lv_offset,length> .
[0412] Case a2: The recovered data includes a portion of the data in the target logical volume. The standard format object also includes the data object, that is, it includes the index object and the data object.
[0413] In some embodiments, the process of downloading a standard format object that has been backed up to a cloud object storage system based on a preset recovery protocol may include, but is not limited to: downloading the data object corresponding to the data object name from the target bucket through the GET interface of the cloud object storage system according to the data object name in the data index.
[0414] For example, the corresponding data object can be downloaded from the Bucket using the GET interface of the cloud object storage system based on the data object name (data_object_name) in the data index.
[0415] In some embodiments, after downloading a data object through the GET interface of a cloud object storage system, the corresponding GET response header can be parsed, the encryption and compression algorithms can be obtained through a custom meta tag, the downloaded data object can be decrypted first, and then the decrypted data object can be decompressed to obtain the original data object.
[0416] In some embodiments, the process of processing a standard format object according to a preset data parsing method to obtain recovered data may include, but is not limited to: parsing the downloaded data object based on the data segment index in the data index and the content format of the data object to obtain a parsing result. The parsing result may be the data written within the data address segment corresponding to the data object.
[0417] For example, the data object can be parsed using the content format described above, and the data to be recovered can be read from the data object according to the data index of the data to be recovered. The data index of the data to be recovered can be as follows:<lv_offset,length> .
[0418] Case b1: The data to be recovered includes all data in the target logical volume; standard format objects include fence objects.
[0419] In some embodiments, the process of downloading standard format objects that have been backed up to the cloud object storage system based on a preset recovery protocol may include, but is not limited to: transmitting a second LIST request through the LIST interface of the cloud object storage system to download a list of fence object names that have been backed up to the cloud object storage system.
[0420] The request header of the second LIST request includes second indication information, which indicates at least one of the following: the target bucket corresponding to the target logical volume, the fence partition name as a prefix for the fence object, the starting position of the LIST, and the maximum number of objects that can be returned in one LIST request.
[0421] For example, a LIST request can be constructed first. The request header of the LIST request can include specifying the Bucket corresponding to the volume, specifying the fence partition name as the object prefix, specifying the starting position marker of the LIST, and specifying the maximum number of objects to be returned in one LIST (max-keys). Then, the list of all fence object names can be downloaded from the Bucket through the LIST interface of the cloud object storage system.
[0422] In some embodiments, the process of processing standard format objects according to a preset data parsing method to obtain recovered data may include, but is not limited to: parsing fence object names in the fence object name list according to the naming format of the fence object to obtain the corresponding fence.
[0423] For example, according to the naming rules of fence objects (see section 6.2.3.1), the fences in the fence object names are parsed one by one, and the largest fence (with the largest version number) is the latest fence.
[0424] Case b2: The recovered data includes all data in the target logical volume; the standard format object also includes the index object, that is, it includes the fence object and the index object.
[0425] In some embodiments, the process of downloading standard format objects that have been backed up to a cloud object storage system based on a preset recovery protocol may also include, but is not limited to, the following steps:
[0426] 711. Download a list of all index object names in the target logical volume.
[0427] First, a LIST request can be constructed. The request header of the LIST request can include specifying the Bucket corresponding to the volume, specifying the index partition name as the object prefix, specifying the starting position marker of the LIST, and specifying the maximum number of objects to be returned in one LIST (max-keys). Then, the LIST interface of the cloud object storage system is used to download the list of all indexed object names from the Bucket.
[0428] 712. Parse the index object names in the index object name list according to the naming format of the index object to determine the version number in the index object name.
[0429] First, the list of indexed object names can be categorized according to index address ranges. Each index address range may correspond to multiple indexed object names. Then, the version number in the indexed object name can be parsed out one by one according to the naming format of the indexed object.
[0430] In some embodiments, after resolving the version number in the index object name, for each index address segment, the largest version number is selected from all version numbers less than or equal to the latest fence, and the index object name corresponding to this version number is the latest index object name for that index address segment.
[0431] In some embodiments, after resolving the version number in the index object name, for each index address range, the index object names with version numbers greater than the latest fence can also be recorded in the cleanup list.
[0432] 713. Download the corresponding index object based on the version number in the index object name.
[0433] Among these methods, the latest index object can be downloaded from the Bucket via the GET interface of the cloud object storage system.
[0434] The GET method mentioned above is a request method in the Hypertext Transfer Protocol used to retrieve data from a specified resource address. This method can be used to obtain resource content such as files and data lists corresponding to the target address, and it is a common basic method for obtaining resources in network communication.
[0435] In some embodiments, the process of processing standard format objects according to a preset data parsing method to obtain recovered data also includes, but is not limited to: parsing the downloaded index object according to the content format of the index object to obtain the data index in the index object. The data index includes a list of data object names. This list of data object names refers to the list of all data object names recorded in the index object.
[0436] It should be noted that the process of obtaining the recovered data in case b2 is similar to that in case a1 above. Please refer to the description in case a1 above, and it will not be repeated here.
[0437] Case b3: The recovered data includes all data in the target logical volume; the standard format object also includes data objects, namely fence objects, index objects, and data objects.
[0438] In case b3, the process of obtaining the data object is similar to that in case a2 above. Please refer to the description in case a2 above, and it will not be repeated here.
[0439] In some embodiments, after downloading the index object through the GET interface of the cloud object storage system, the corresponding GET response header can be parsed, and the encryption and compression algorithms can be obtained through a custom meta tag. The downloaded object is first decrypted, and then the decrypted index object is decompressed to obtain the original index object.
[0440] 703. Store the recovered data in the block storage system.
[0441] This could involve transferring data written within the data address range corresponding to the data object to a block storage system for storage.
[0442] In some embodiments, when the recovery of all data in the logical volume is complete, the index objects to be cleaned up in the cloud object storage can be cleaned up.
[0443] The aforementioned index objects to be cleaned may include the names of the index objects stored in the aforementioned cleanup queue.
[0444] In some embodiments, index objects to be cleaned can be deleted in batches from the cloud object storage system through the DELETE interface of the cloud object storage system.
[0445] It should be noted that, in the embodiments of this application, each process in the recovery process can be implemented based on a preset recovery protocol. For example, the downloading, parsing, decompression, decryption, transmission, and storage processes of data objects, index objects, and fence objects mentioned above can be implemented based on a preset backup protocol.
[0446] The data recovery method described above implements the data backup process based on a preset recovery protocol and preset data recovery method, which unifies the communication protocol and recovery method during the data backup process. This improves the universality and portability of the data recovery method, thereby facilitating the flexible selection of recovery product vendors and cloud service providers.
[0447] In this embodiment of the application, in order to meet compatibility requirements, the cloud object storage system must meet at least one of the following requirements:
[0448] It has at least one of the following interfaces at the object level: PUT, GET, DELETE, and LIST.
[0449] The PUT interface supports user-defined metadata;
[0450] The response header returned by the GET interface contains user-defined metadata;
[0451] The LIST interface supports specifying a prefix for the LIST and retrieving a list of all object names with the same prefix.
[0452] The LIST interface supports specifying the starting position of the LIST;
[0453] The LIST interface supports specifying the maximum number of objects that can be returned in a single LIST request;
[0454] After a standard format object is uploaded to the cloud object storage system via the PUT interface, it supports performing GET operations and / or LIST operations on the standard format object.
[0455] In some embodiments, to meet the integrity requirements during data backup, the entity tags of standard format objects stored in the block storage system are compared with the entity tags of standard format objects returned by the cloud object storage system after the standard format objects are backed up to the cloud object storage system for integrity verification.
[0456] In some embodiments, to meet the integrity requirements during data recovery, the entity tags of standard format objects downloaded in the block storage system are compared with the entity tags of standard format objects returned by the cloud object storage system for integrity verification.
[0457] It should be understood that although the steps in the flowcharts of the embodiments described above are shown sequentially according to the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless explicitly stated herein, there is no strict order restriction on the execution of these steps, and they can be executed in other orders. Moreover, at least some steps in the flowcharts of the embodiments described above may include multiple steps or multiple stages. These steps or stages are not necessarily completed at the same time, but can be executed at different times. The execution order of these steps or stages is not necessarily sequential, but can be performed alternately or in turn with other steps or at least some of the steps or stages in other steps. It is understood that the steps in different embodiments can be freely combined as needed, and all non-contradictory solutions formed by such combinations are within the scope of protection of this application.
[0458] Based on the same inventive concept, this application also provides a data backup apparatus for implementing the data backup method described above. The solution provided by this apparatus is similar to the implementation described in the above method; therefore, the specific limitations in one or more data backup apparatus embodiments provided below can be found in the limitations of the data backup method described above, and will not be repeated here.
[0459] In an exemplary embodiment, as shown in FIG13, a data backup device is provided, comprising:
[0460] The processing module 801 is used to process the data to be backed up in the block storage system according to a preset data organization method to obtain a standard format object;
[0461] Backup module 802 is used to back up the standard format object to a cloud object storage system;
[0462] The standard format object includes at least one of the following: a data object, an index object, and a fence object.
[0463] In some embodiments, the backup module 802 is specifically used to: transmit the standard format object to a cloud object storage system based on a preset backup protocol.
[0464] In some embodiments, the block storage system includes at least one logical volume, the logical volume includes at least one index address segment, the index address segment includes at least one data address segment, the data address segment includes at least one block data address segment, and the block data address segment is used to store the block data.
[0465] In some embodiments, the cloud object storage system includes at least one bucket, each bucket corresponding to a different logical volume, and the logical partitions in the bucket include at least one of: data partitions, index partitions, and fence partitions;
[0466] The data objects in the same storage bucket are all identified by the data partition name of the data partition mentioned therein;
[0467] The index objects in the same storage bucket are all identified by the index partition name of the index partition mentioned therein;
[0468] The fence objects in the same storage bucket are all identified by the fence partition name of the fence partition.
[0469] In some embodiments, the naming format of the data partition includes: a prefix, the volume identifier corresponding to the storage bucket, and the data partition identifier;
[0470] And / or,
[0471] The naming format of the index partition includes: a prefix, the volume identifier corresponding to the storage bucket, and the index partition identifier;
[0472] And / or,
[0473] The naming format of the fence partition includes: a prefix, the volume identifier corresponding to the storage bucket, and the index partition identifier.
[0474] In some embodiments, the standard format object includes: a data object; and a processing module 801, specifically configured to: determine the data to be backed up in the target index address segment of the target logical volume in the block storage system;
[0475] According to the content format of the data objects, data objects are generated for the data corresponding to each data address segment in the data to be backed up, and a data object name is generated for each data object according to the naming format of the data objects.
[0476] In some embodiments, the data object is obtained by encapsulating data in address order within a data address segment of a logical volume, and the content format of the data object includes at least one of the following:
[0477] A complete data object, for which, during a full backup, all data written to a data address segment within the logical volume before the backup time is encapsulated into an independent object;
[0478] Incremental data object: During incremental backup, data newly written to a data address segment in the logical volume before the backup time and after the last arbitrary backup is encapsulated into an independent object.
[0479] Differential data object: During differential backup, data newly written to a data address range within the logical volume before the backup time and after the most recent full backup is encapsulated into an independent object.
[0480] In some embodiments, the naming format of the data object includes: the data partition name, the starting address of the data address segment corresponding to the data object in the logical volume, and the version number of the data object.
[0481] In some embodiments, the backup module 802 is specifically used to: transmit a first PUT request through the PUT interface of the cloud object storage system to upload the generated data object to the target storage bucket corresponding to the target logical volume;
[0482] The request body of the first PUT request includes a data object, and the request header of the first PUT request includes first indication information, which indicates at least one of the following: the target storage bucket, the data object name, the compression algorithm, and the encryption algorithm.
[0483] In some embodiments, the standard format object further includes: an index object; the processing module 801 is specifically configured to: generate an index object for the target index address segment according to the content format of the index object, when all data address segments within the target index address segment have generated data objects and have been backed up to the cloud object storage system; and generate an index object name according to the naming format of the index object.
[0484] In some embodiments, the naming format of the index object includes:
[0485] The index partition name, the starting address of the index address range corresponding to the index object in the logical volume, and the version number of the index object.
[0486] In some embodiments, the content format of the index object includes: a header metadata area and an index area;
[0487] The header metadata area includes the metadata of the index object, and the index area includes: the position index of the data in each data object in the index address segment, and the position index of each data object is stored in the index area in address order.
[0488] In some embodiments, the header metadata area includes at least one of the following fields:
[0489] A file information field, wherein the file information field is used to identify at least one of the purpose and owner of the indexed object;
[0490] A software version field, which is used to identify the backup software version;
[0491] A start address field, wherein the start address field is used to identify the starting address of the index address segment corresponding to the index object in the logical volume;
[0492] The address length field is used to identify the address length of the index address segment corresponding to the index object in the logical volume.
[0493] A data volume field, which is used to identify the amount of valid data written into the target index address segment;
[0494] An index number field is used to identify the number of data segment indexes included in the index object;
[0495] The data object name length field, wherein the index number segment is used to identify the length of the data object name in the index area;
[0496] First validation field;
[0497] Reserved fields.
[0498] In some embodiments, the location index of data in the index address segment within each data object includes at least one of the following fields:
[0499] Data object name field;
[0500] A start address field, which is used to identify the starting address of the data in the data object;
[0501] Data length field;
[0502] Fence partition identifier field;
[0503] The second verification field.
[0504] In some embodiments, the backup module 802 is specifically used to: transmit a second PUT request through the PUT interface of the cloud object storage system to upload at least one generated index object to the target storage bucket corresponding to the target logical volume;
[0505] The request body of the second PUT request includes the generated index object, and the request header of the second PUT request includes second indication information, which indicates at least one of the following: the target storage bucket, the index object name, the compression algorithm, and the encryption algorithm.
[0506] In some embodiments, the standard format object further includes: a fence object; the processing module 801 is specifically configured to: generate a fence object according to the content format requirements of the fence object, and generate a fence object name according to the naming format of the fence object, when all index address segments in the target logical volume have been generated into index objects and backed up to the cloud object storage system;
[0507] The fence object is used to indicate that a backup of the target logical volume has been completed.
[0508] In some embodiments, the content format of the fence object is empty;
[0509] The naming format of the fence object includes: fence partition name and version number of this backup.
[0510] In some embodiments, the backup module 802 is specifically used to: transmit a third PUT request through the PUT interface of the cloud object storage system to upload the generated fence object to the target storage bucket corresponding to the target logical volume; wherein the request header of the third PUT request includes third indication information, the third indication information being used to indicate at least one of the following: the target storage bucket, the fence object name.
[0511] In some embodiments, the apparatus further includes a deletion module, configured to delete objects to be cleaned through the DELETE interface of a cloud object storage system after a backup of the target logical volume has been completed, the objects to be cleaned including at least one of data objects, index objects, and fence objects.
[0512] In some embodiments, the method for determining the object to be cleaned includes at least one of the following:
[0513] For each data address range, if the complete backup generates and backs up new data objects, then the data objects previously backed up to the cloud object storage system are data objects to be cleaned up.
[0514] For each data address range, if the differential backup completed this time generates and backs up new data objects, then the data objects backed up to the cloud object storage system after the last full backup and before this differential backup are data objects to be cleaned up.
[0515] For each index data segment, if the current full backup generates and backs up a new index object, then the index object previously backed up to the cloud object storage system is an index object to be cleaned up.
[0516] For each index data segment, if the differential backup completed this time newly generates and backs up the index object, then the index object previously backed up to the cloud object storage system is the index object to be cleaned up.
[0517] For each logical volume, the previously backed-up fence objects are the fence objects to be cleaned up.
[0518] In some embodiments, the cloud object storage system meets at least one of the following requirements:
[0519] It has at least one of the following interfaces at the object level: PUT, GET, DELETE, and LIST.
[0520] The PUT interface supports user-defined metadata;
[0521] The response header returned by the GET interface contains user-defined metadata;
[0522] The LIST interface supports specifying a prefix for the LIST and retrieving a list of all object names with the same prefix.
[0523] The LIST interface supports specifying the starting position of the LIST;
[0524] The LIST interface supports specifying the maximum number of objects that can be returned in a single LIST request;
[0525] After a standard format object is uploaded to the cloud object storage system via the PUT interface, it supports performing GET operations and / or LIST operations on the standard format object.
[0526] In some embodiments, the apparatus further includes a verification module for:
[0527] The entity tags of the standard format object stored in the block storage system are compared with the entity tags of the standard format object returned by the cloud object storage system after the standard format object is backed up to the cloud object storage system for integrity verification.
[0528] Each module in the aforementioned data backup device can be implemented entirely or partially through software, hardware, or a combination thereof. These modules can be embedded in the processor of a computer device in hardware form or independent of it, or stored in the memory of the computer device in software form, so that the processor can call and execute the operations corresponding to each module.
[0529] Based on the same inventive concept, this application also provides a data recovery apparatus for implementing the data recovery method described above. The solution provided by this apparatus is similar to the implementation described in the above method; therefore, the specific limitations in one or more data recovery apparatus embodiments provided below can be found in the limitations of the data recovery method described above, and will not be repeated here.
[0530] In an exemplary embodiment, as shown in FIG14, a data recovery apparatus is provided, comprising:
[0531] Download module 801 downloads standard format objects that have been backed up to the cloud object storage system;
[0532] The recovery module 802 processes the standard format object according to a preset data parsing method to obtain the recovered data;
[0533] Storage module 803 stores the recovered data in the block storage system;
[0534] The standard format object includes at least one of the following: a data object, an index object, and a fence object.
[0535] In some embodiments, the download module 801 is specifically used to: download standard format objects that have been backed up to the cloud object storage system based on a preset recovery protocol.
[0536] In some embodiments, the block storage system includes at least one logical volume, the logical volume includes at least one index address segment, the index address segment includes at least one data address segment, the data address segment includes at least one block data address segment, and the block data address segment is used to store the block data.
[0537] In some embodiments, the cloud object storage system includes at least one bucket, each bucket corresponding to a different logical volume, and the logical partitions in the bucket include at least one of: data partitions, index partitions, and fence partitions;
[0538] The data objects in the same storage bucket are all identified by the data partition name of the data partition mentioned therein;
[0539] The index objects in the same storage bucket are all identified by the index partition name of the index partition mentioned therein;
[0540] The fence objects in the same storage bucket are all identified by the fence partition name of the fence partition.
[0541] In some embodiments,
[0542] The naming format of the data partition includes: a prefix, the logical volume identifier corresponding to the storage bucket, and the data partition identifier;
[0543] And / or,
[0544] The naming format of the index partition includes: a prefix, the logical volume identifier corresponding to the storage bucket, and the index partition identifier;
[0545] And / or,
[0546] The naming format of the fence partition includes: a prefix, the logical volume identifier corresponding to the storage bucket, and the index partition identifier.
[0547] In some embodiments, the data to be recovered includes a portion of data in the target logical volume, and the standard format object includes an index object; the download module 801 is specifically configured to: determine the target index address segment where the data to be recovered is located based on the starting address and address length of the data to be recovered in the target logical volume; download the list of index object names corresponding to the target index address segment; parse the index object names in the list of index object names according to the naming format of the index objects to determine the version number in the index object name; and download the corresponding index object based on the version number in the index object name.
[0548] In some embodiments, the download module 801 is specifically used to: transmit a first LIST request through the LIST interface of the cloud object storage system to download the list of index object names corresponding to the target index address range;
[0549] The request header of the first LIST request includes first indication information, which indicates at least one of the following: the target bucket corresponding to the target logical volume, the index partition name as the prefix of the index object, the starting position of the LIST, and the maximum number of objects that can be returned in one LIST.
[0550] In some embodiments, the download module 801 is specifically used to: download the index object corresponding to the index object name from the target storage bucket through the GET interface of the cloud object storage system.
[0551] In some embodiments, the recovered data includes all data in the target logical volume; the standard format object includes a fence object; the download module 801 is specifically used for:
[0552] A second LIST request is transmitted through the LIST interface of the cloud object storage system to download the list of fence object names that have been backed up to the cloud object storage system.
[0553] The request header of the second LIST request includes second indication information, which indicates at least one of the following: the target bucket corresponding to the target logical volume, the fence partition name as a prefix of the fence object, the starting position of the LIST, and the maximum number of objects that can be returned in one LIST.
[0554] In some embodiments, the recovery module 802 is specifically used for:
[0555] Based on the naming format of the fence objects, the fence object names in the fence object name list are parsed to obtain the corresponding fences.
[0556] In some embodiments, the fence object is used to indicate that a backup of the target logical volume has been completed; the content format of the fence object is empty;
[0557] The naming format of the fence object includes: fence partition name and version number of this backup.
[0558] In some embodiments, the standard format object further includes an index object; the download module 801 is specifically used to: download a list of all index object names in the target logical volume;
[0559] The index object names in the index object name list are parsed according to the naming format of the index objects to determine the version number in the index object name;
[0560] Download the corresponding index object based on the version number in the index object name.
[0561] In some embodiments, the naming format of the index object includes:
[0562] The index partition name, the starting address of the index address range corresponding to the index object in the logical volume, and the version number of the index object.
[0563] In some embodiments, the recovery module 802 is specifically used for:
[0564] The downloaded index object is parsed according to its content format to obtain the data index in the index object, which includes a list of data object names.
[0565] In some embodiments, the content format of the index object includes: a header metadata area and an index area;
[0566] The header metadata area includes the metadata of the index object, and the index area includes: the position index of the data in each data object in the index address segment, and the position index of each data object is stored in the index area in address order.
[0567] In some embodiments, the header metadata area includes at least one of the following fields:
[0568] A file information field, wherein the file information field is used to identify at least one of the purpose and owner of the indexed object;
[0569] A software version field, which is used to identify the backup software version;
[0570] A start address field, wherein the start address field is used to identify the starting address of the index address segment corresponding to the index object in the logical volume;
[0571] The address length field is used to identify the address length of the index address segment corresponding to the index object in the logical volume.
[0572] A data volume field, which is used to identify the amount of valid data written into the target index address segment;
[0573] An index number field is used to identify the number of data segment indexes included in the index object;
[0574] The data object name length field, wherein the index number segment is used to identify the length of the data object name in the index area;
[0575] First validation field;
[0576] Reserved fields.
[0577] In some embodiments, the location index of data in the index address segment within each data object includes at least one of the following fields:
[0578] Data object name field;
[0579] A start address field, which is used to identify the starting address of the data in the data object;
[0580] Data length field;
[0581] Fence partition identifier field;
[0582] The second verification field.
[0583] In some embodiments, the standard format object further includes: a data object, and the download module 801 is specifically used for:
[0584] Based on the data object name in the data index, the data object corresponding to the data object name is downloaded from the target storage bucket through the GET interface of the cloud object storage system.
[0585] In some embodiments, the recovery module 802 is specifically used for:
[0586] Based on the data segment index in the data index and the content format of the data object, the downloaded data object is parsed to obtain the parsing result.
[0587] In some embodiments, the data object is obtained by encapsulating data in address order within a data address segment of a logical volume;
[0588] The content format of the data object includes at least one of the following:
[0589] A complete data object, for which, during a full backup, all data written to a data address segment within the logical volume before the backup time is encapsulated into an independent object;
[0590] Incremental data object: During incremental backup, data newly written to a data address segment in the logical volume before the backup time and after the last arbitrary backup is encapsulated into an independent object.
[0591] Differential data object: During differential backup, data newly written to a data address range within the volume before the backup time and after the most recent full backup is encapsulated into an independent object.
[0592] In some embodiments, the naming format of the data object includes:
[0593] The data partition name, the starting address of the data address range corresponding to the data object in the logical volume, and the version number of the data object.
[0594] In some embodiments, the cloud object storage system meets at least one of the following requirements:
[0595] It has at least one of the following interfaces at the object level: PUT, GET, DELETE, and LIST.
[0596] The PUT interface supports user-defined metadata;
[0597] The response header returned by the GET interface contains user-defined metadata;
[0598] The LIST interface supports specifying a prefix for the LIST and retrieving a list of all object names with the same prefix.
[0599] The LIST interface supports specifying the starting position of the LIST;
[0600] The LIST interface supports specifying the maximum number of objects that can be returned in a single LIST request;
[0601] After a standard format object is uploaded to the cloud object storage system via the PUT interface, it supports performing GET operations and / or LIST operations on the standard format object.
[0602] In some embodiments, the apparatus further includes: a verification module, configured to:
[0603] The entity tags of the standard format objects downloaded in the block storage system are compared with the entity tags of the standard format objects returned by the cloud object storage system to perform integrity verification.
[0604] Each module in the aforementioned data recovery device can be implemented entirely or partially through software, hardware, or a combination thereof. These modules can be embedded in the processor of a computer device in hardware form or independent of it, or stored in the memory of the computer device in software form, so that the processor can call and execute the operations corresponding to each module.
[0605] In an exemplary embodiment, a computer device is provided, which may be a terminal or a server, and its internal structure diagram may be as shown in Figure 15. The computer device includes a processor, memory, input / output interfaces (I / O), and a communication interface. The processor, memory, and I / O interfaces are connected via a system bus, and the communication interface is connected to the system bus via the I / O interfaces. The processor of the computer device provides computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium. The database of the computer device is used to store data. The I / O interfaces of the computer device are used for exchanging information between the processor and external devices. The communication interface of the computer device is used for communicating with external terminals via a network connection. When the computer program is executed by the processor, it implements various processes of the data backup method in the above embodiment.
[0606] Those skilled in the art will understand that the structure shown in Figure 15 is merely a block diagram of a portion of the structure related to the present application and does not constitute a limitation on the computer device to which the present application is applied. Specific computer devices may include more or fewer components than those shown in the figure, or may combine certain components, or may have different component arrangements.
[0607] In one exemplary embodiment, a computer device is provided, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to perform the following steps:
[0608] According to the preset data organization method, the data to be backed up in the block storage system is processed to obtain standard format objects;
[0609] Back up the standard format object to a cloud object storage system;
[0610] The standard format object includes at least one of the following: a data object, an index object, and a fence object.
[0611] In some embodiments, when a processor executes a computer program, it further performs the following steps:
[0612] Based on a preset backup protocol, the standard format object is transmitted to the cloud object storage system.
[0613] In some embodiments, the block storage system includes at least one logical volume, the logical volume includes at least one index address segment, the index address segment includes at least one data address segment, the data address segment includes at least one block data address segment, and the block data address segment is used to store the block data.
[0614] In some embodiments, the cloud object storage system includes at least one bucket, each bucket corresponding to a different logical volume, and the logical partitions in the bucket include at least one of: data partitions, index partitions, and fence partitions;
[0615] The data objects in the same storage bucket are all identified by the data partition name of the data partition mentioned therein;
[0616] The index objects in the same storage bucket are all identified by the index partition name of the index partition mentioned therein;
[0617] The fence objects in the same storage bucket are all identified by the fence partition name of the fence partition.
[0618] In some embodiments, the naming format of the data partition includes: a prefix, the volume identifier corresponding to the storage bucket, and the data partition identifier;
[0619] And / or,
[0620] The naming format of the index partition includes: a prefix, the volume identifier corresponding to the storage bucket, and the index partition identifier;
[0621] And / or,
[0622] The naming format of the fence partition includes: a prefix, the volume identifier corresponding to the storage bucket, and the index partition identifier.
[0623] In some embodiments, the standard format object includes: a data object; when the processor executes a computer program, it further implements the following steps:
[0624] Identify the data to be backed up in the target index address range of the target logical volume in the block storage system;
[0625] According to the content format of the data objects, data objects are generated for the data corresponding to each data address segment in the data to be backed up, and a data object name is generated for each data object according to the naming format of the data objects.
[0626] In some embodiments, the data object is obtained by encapsulating data in address order within a data address segment of a logical volume, and the content format of the data object includes at least one of the following:
[0627] A complete data object, for which, during a full backup, all data written to a data address segment within the logical volume before the backup time is encapsulated into an independent object;
[0628] Incremental data object: During incremental backup, data newly written to a data address segment in the logical volume before the backup time and after the last arbitrary backup is encapsulated into an independent object.
[0629] Differential data object: During differential backup, data newly written to a data address range within the logical volume before the backup time and after the most recent full backup is encapsulated into an independent object.
[0630] In some embodiments, the naming format of the data object includes:
[0631] The data partition name, the starting address of the data address range corresponding to the data object in the logical volume, and the version number of the data object.
[0632] In some embodiments, when a processor executes a computer program, it further performs the following steps:
[0633] The first PUT request is transmitted through the PUT interface of the cloud object storage system to upload the generated data object to the target storage bucket corresponding to the target logical volume;
[0634] The request body of the first PUT request includes a data object, and the request header of the first PUT request includes first indication information, which indicates at least one of the following: the target storage bucket, the data object name, the compression algorithm, and the encryption algorithm.
[0635] In some embodiments, the standard format object further includes: an index object;
[0636] The process of processing the target data to be backed up in the block storage system according to a preset data organization method to obtain a standard format object includes:
[0637] If all data address segments within the target index address segment have generated data objects and have been backed up to the cloud object storage system, an index object is generated for the target index address segment according to the content format of the index object; and an index object name is generated according to the naming format of the index object.
[0638] The naming format for the index object includes:
[0639] In some embodiments, the index partition name, the starting address of the index address range corresponding to the index object in the logical volume, and the version number of the index object are included.
[0640] In some embodiments, the content format of the index object includes: a header metadata area and an index area;
[0641] The header metadata area includes the metadata of the index object, and the index area includes: the position index of the data in each data object in the index address segment, and the position index of each data object is stored in the index area in address order.
[0642] In some embodiments, the header metadata area includes at least one of the following fields:
[0643] A file information field, wherein the file information field is used to identify at least one of the purpose and owner of the indexed object;
[0644] A software version field, which is used to identify the backup software version;
[0645] A start address field, wherein the start address field is used to identify the starting address of the index address segment corresponding to the index object in the logical volume;
[0646] The address length field is used to identify the address length of the index address segment corresponding to the index object in the logical volume.
[0647] A data volume field, which is used to identify the amount of valid data written into the target index address segment;
[0648] An index number field is used to identify the number of data segment indexes included in the index object;
[0649] The data object name length field, wherein the index number segment is used to identify the length of the data object name in the index area;
[0650] First validation field;
[0651] Reserved fields.
[0652] In some embodiments, the location index of data in the index address segment within each data object includes at least one of the following fields:
[0653] Data object name field;
[0654] A start address field, which is used to identify the starting address of the data in the data object;
[0655] Data length field;
[0656] Fence partition identifier field;
[0657] The second verification field.
[0658] In some embodiments, when a processor executes a computer program, it further performs the following steps:
[0659] The second PUT request is transmitted through the PUT interface of the cloud object storage system to upload at least one generated index object to the target storage bucket corresponding to the target logical volume.
[0660] The request body of the second PUT request includes the generated index object, and the request header of the second PUT request includes second indication information, which indicates at least one of the following: the target storage bucket, the index object name, the compression algorithm, and the encryption algorithm.
[0661] In some embodiments, the standard format object further includes: a fence object; the processor, when executing the computer program, also implements the following steps:
[0662] If all index address segments in the target logical volume have been generated into index objects and backed up to the cloud object storage system, generate a fence object according to the content format requirements of the fence object, and generate a fence object name according to the naming format of the fence object.
[0663] The fence object is used to indicate that a backup of the target logical volume has been completed.
[0664] In some embodiments, the content format of the fence object is empty;
[0665] The naming format of the fence object includes: fence partition name and version number of this backup.
[0666] In some embodiments, when a processor executes a computer program, it further performs the following steps:
[0667] A third PUT request is transmitted through the PUT interface of the cloud object storage system to upload the generated fence object to the target storage bucket corresponding to the target logical volume.
[0668] The request header of the third PUT request includes third indication information, which is used to indicate at least one of the following: the target storage bucket and the fence object name.
[0669] In some embodiments, when a processor executes a computer program, it further performs the following steps:
[0670] After a backup of the target logical volume is completed, the objects to be cleaned are deleted through the DELETE interface of the cloud object storage system. The objects to be cleaned include at least one of data objects, index objects, and fence objects.
[0671] In some embodiments, the method for determining the object to be cleaned includes at least one of the following:
[0672] For each data address range, if the complete backup generates and backs up new data objects, then the data objects previously backed up to the cloud object storage system are data objects to be cleaned up.
[0673] For each data address range, if the differential backup completed this time generates and backs up new data objects, then the data objects backed up to the cloud object storage system after the last full backup and before this differential backup are data objects to be cleaned up.
[0674] For each index data segment, if the current full backup generates and backs up a new index object, then the index object previously backed up to the cloud object storage system is an index object to be cleaned up.
[0675] For each index data segment, if the differential backup completed this time newly generates and backs up the index object, then the index object previously backed up to the cloud object storage system is the index object to be cleaned up.
[0676] For each logical volume, the previously backed-up fence objects are the fence objects to be cleaned up.
[0677] In some embodiments, the cloud object storage system meets at least one of the following requirements:
[0678] It has at least one of the following interfaces at the object level: PUT, GET, DELETE, and LIST.
[0679] The PUT interface supports user-defined metadata;
[0680] The response header returned by the GET interface contains user-defined metadata;
[0681] The LIST interface supports specifying a prefix for the LIST and retrieving a list of all object names with the same prefix.
[0682] The LIST interface supports specifying the starting position of the LIST;
[0683] The LIST interface supports specifying the maximum number of objects that can be returned in a single LIST request;
[0684] After a standard format object is uploaded to the cloud object storage system via the PUT interface, it supports performing GET operations and / or LIST operations on the standard format object.
[0685] In some embodiments, when a processor executes a computer program, it further performs the following steps:
[0686] The entity tags of the standard format object stored in the block storage system are compared with the entity tags of the standard format object returned by the cloud object storage system after the standard format object is backed up to the cloud object storage system for integrity verification.
[0687] In some embodiments, a computer-readable storage medium is provided having a computer program stored thereon, which, when executed by a processor, implements the various processes shown in the above method embodiments.
[0688] In some embodiments, a computer program product is provided, including a computer program that, when executed by a processor, implements the various processes shown in the above method embodiments.
[0689] In one exemplary embodiment, a computer device is provided, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to perform the following steps:
[0690] Download the standard format object that has been backed up to the cloud object storage system;
[0691] The standard format object is processed according to a preset data parsing method to obtain the recovered data;
[0692] The recovered data is stored in a block storage system;
[0693] The standard format object includes at least one of the following: a data object, an index object, and a fence object.
[0694] In some embodiments, when a processor executes a computer program, it further performs the following steps:
[0695] Based on the preset recovery protocol, download the standard format object that has been backed up to the cloud object storage system.
[0696] In some embodiments, the block storage system includes at least one logical volume, the logical volume includes at least one index address segment, the index address segment includes at least one data address segment, the data address segment includes at least one block data address segment, and the block data address segment is used to store the block data.
[0697] In some embodiments, the cloud object storage system includes at least one bucket, each bucket corresponding to a different logical volume, and the logical partitions in the bucket include at least one of: data partitions, index partitions, and fence partitions;
[0698] The data objects in the same storage bucket are all identified by the data partition name of the data partition mentioned therein;
[0699] The index objects in the same storage bucket are all identified by the index partition name of the index partition mentioned therein;
[0700] The fence objects in the same storage bucket are all identified by the fence partition name of the fence partition.
[0701] In some embodiments, the naming format of the data partition includes: a prefix, the logical volume identifier corresponding to the storage bucket, and the data partition identifier;
[0702] And / or,
[0703] The naming format of the index partition includes: a prefix, the logical volume identifier corresponding to the storage bucket, and the index partition identifier;
[0704] And / or,
[0705] The naming format of the fence partition includes: a prefix, the logical volume identifier corresponding to the storage bucket, and the index partition identifier.
[0706] In some embodiments, the recovered data includes a portion of data in the target logical volume, and the standard format object includes an index object; the processor, when executing the computer program, also performs the following steps:
[0707] Based on the starting address and address length of the data to be recovered in the target logical volume, determine the target index address segment where the data to be recovered is located;
[0708] Download the list of index object names corresponding to the target index address range;
[0709] The index object names in the index object name list are parsed according to the naming format of the index objects to determine the version number in the index object name;
[0710] Download the corresponding index object based on the version number in the index object name.
[0711] In some embodiments, when the processor executes a computer program, it further performs the following steps: transmitting a first LIST request through the LIST interface of the cloud object storage system to download a list of index object names corresponding to the target index address range;
[0712] The request header of the first LIST request includes first indication information, which indicates at least one of the following: the target bucket corresponding to the target logical volume, the index partition name as the prefix of the index object, the starting position of the LIST, and the maximum number of objects that can be returned in one LIST.
[0713] In some embodiments, when the processor executes the computer program, it further performs the following steps: downloading the index object corresponding to the index object name from the target storage bucket through the GET interface of the cloud object storage system.
[0714] In some embodiments, the recovered data includes all data in the target logical volume; the standard format object includes a fence object; and the processor, when executing the computer program, further implements the following steps:
[0715] A second LIST request is transmitted through the LIST interface of the cloud object storage system to download the list of fence object names that have been backed up to the cloud object storage system.
[0716] The request header of the second LIST request includes second indication information, which indicates at least one of the following: the target bucket corresponding to the target logical volume, the fence partition name as a prefix of the fence object, the starting position of the LIST, and the maximum number of objects that can be returned in one LIST.
[0717] In some embodiments, when a processor executes a computer program, it further performs the following steps:
[0718] Based on the naming format of the fence objects, the fence object names in the fence object name list are parsed to obtain the corresponding fences.
[0719] In some embodiments, the fence object is used to indicate that a backup of the target logical volume has been completed; the content format of the fence object is empty; the naming format of the fence object includes: fence partition name and version number of the backup completed.
[0720] In some embodiments, the standard format object further includes an index object; the processor, when executing the computer program, also performs the following steps:
[0721] Download the list of all index object names in the target logical volume;
[0722] The index object names in the index object name list are parsed according to the naming format of the index objects to determine the version number in the index object name;
[0723] Download the corresponding index object based on the version number in the index object name.
[0724] In some embodiments, the naming format of the index object includes:
[0725] The index partition name, the starting address of the index address range corresponding to the index object in the logical volume, and the version number of the index object.
[0726] In some embodiments, when a processor executes a computer program, it further performs the following steps:
[0727] The downloaded index object is parsed according to its content format to obtain the data index in the index object, which includes a list of data object names.
[0728] In some embodiments, the content format of the index object includes: a header metadata area and an index area;
[0729] The header metadata area includes the metadata of the index object, and the index area includes: the position index of the data in each data object in the index address segment, and the position index of each data object is stored in the index area in address order.
[0730] In some embodiments, the header metadata area includes at least one of the following fields:
[0731] A file information field, wherein the file information field is used to identify at least one of the purpose and owner of the indexed object;
[0732] A software version field, which is used to identify the backup software version;
[0733] A start address field, wherein the start address field is used to identify the starting address of the index address segment corresponding to the index object in the logical volume;
[0734] The address length field is used to identify the address length of the index address segment corresponding to the index object in the logical volume.
[0735] A data volume field, which is used to identify the amount of valid data written into the target index address segment;
[0736] An index number field is used to identify the number of data segment indexes included in the index object;
[0737] The data object name length field, wherein the index number segment is used to identify the length of the data object name in the index area;
[0738] First validation field;
[0739] Reserved fields.
[0740] In some embodiments, the location index of data in the index address segment within each data object includes at least one of the following fields:
[0741] Data object name field;
[0742] A start address field, which is used to identify the starting address of the data in the data object;
[0743] Data length field;
[0744] Fence partition identifier field;
[0745] The second verification field.
[0746] In some embodiments, the standard format object further includes: a data object, which, when the processor executes a computer program, further implements the following steps:
[0747] Based on the data object name in the data index, the data object corresponding to the data object name is downloaded from the target storage bucket through the GET interface of the cloud object storage system.
[0748] In some embodiments, when a processor executes a computer program, it further performs the following steps:
[0749] Based on the data segment index in the data index and the content format of the data object, the downloaded data object is parsed to obtain the parsing result.
[0750] In some embodiments, when a processor executes a computer program, it further performs the following steps:
[0751] The data object is obtained by encapsulating data in address order within a data address segment within a logical volume;
[0752] The content format of the data object includes at least one of the following:
[0753] A complete data object, for which, during a full backup, all data written to a data address segment within the logical volume before the backup time is encapsulated into an independent object;
[0754] Incremental data object: During incremental backup, data newly written to a data address segment in the logical volume before the backup time and after the last arbitrary backup is encapsulated into an independent object.
[0755] Differential data object: During differential backup, data newly written to a data address range within the volume before the backup time and after the most recent full backup is encapsulated into an independent object.
[0756] In some embodiments, the naming format of the data object includes:
[0757] The data partition name, the starting address of the data address range corresponding to the data object in the logical volume, and the version number of the data object.
[0758] In some embodiments, the cloud object storage system meets at least one of the following requirements:
[0759] It has at least one of the following interfaces at the object level: PUT, GET, DELETE, and LIST.
[0760] The PUT interface supports user-defined metadata;
[0761] The response header returned by the GET interface contains user-defined metadata;
[0762] The LIST interface supports specifying a prefix for the LIST and retrieving a list of all object names with the same prefix.
[0763] The LIST interface supports specifying the starting position of the LIST;
[0764] The LIST interface supports specifying the maximum number of objects that can be returned in a single LIST request;
[0765] After a standard format object is uploaded to the cloud object storage system via the PUT interface, it supports performing GET operations and / or LIST operations on the standard format object.
[0766] In some embodiments, when a processor executes a computer program, it further performs the following steps:
[0767] The entity tags of the standard format objects downloaded in the block storage system are compared with the entity tags of the standard format objects returned by the cloud object storage system to perform integrity verification.
[0768] In some embodiments, a computer-readable storage medium is provided having a computer program stored thereon, which, when executed by a processor, implements the various processes shown in the above method embodiments.
[0769] In some embodiments, a computer program product is provided, including a computer program that, when executed by a processor, implements the various processes shown in the above method embodiments.
[0770] Those skilled in the art will understand that all or part of the processes in the methods of the above embodiments can be implemented by a computer program instructing related hardware. The computer program can be stored in a non-volatile computer-readable storage medium, and when executed, it can include the processes of the embodiments of the above methods. Any references to memory, databases, or other media used in the embodiments provided in this application can include at least one of non-volatile memory and volatile memory. Non-volatile memory can include read-only memory (ROM), magnetic tape, floppy disk, flash memory, optical memory, high-density embedded non-volatile memory, resistive random access memory (ReRAM), magnetic random access memory (MRAM), ferroelectric random access memory (FRAM), phase change memory (PCM), graphene memory, etc. Volatile memory can include random access memory (RAM) or external cache memory, etc. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM). The databases involved in the embodiments provided in this application may include at least one type of relational database and non-relational database. Non-relational databases may include, but are not limited to, blockchain-based distributed databases. The processors involved in the embodiments provided in this application may be general-purpose processors, central processing units, graphics processing units, digital signal processors, programmable logic devices, quantum computing-based data processing logic devices, artificial intelligence (AI) processors, etc., and are not limited to these.
[0771] The technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.
[0772] The embodiments described above are merely illustrative of several implementation methods of this application, and while the descriptions are relatively specific and detailed, they should not be construed as limiting the scope of the patent application. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of this application, and these all fall within the protection scope of this application. Therefore, the protection scope of this patent application should be determined by the appended claims.
Claims
A data backup method characterized by comprising: The method includes: According to the preset data organization method, the data to be backed up in the block storage system is processed to obtain standard format objects; and Back up the standard format object to a cloud object storage system; The standard format object includes at least one of the following: a data object, an index object, and a fence object. The method of claim 1, wherein The block storage system includes at least one logical volume, the logical volume includes at least one index address segment, the index address segment includes at least one data address segment, the data address segment includes at least one block data address segment, and the block data address segment is used to store the block data. The method according to any one of claims 1 to 2, characterized in that The cloud object storage system includes at least one storage bucket, each storage bucket corresponds to a different logical volume, and the logical partitions in the storage bucket include: data partitions, index partitions, and fence partitions; The data objects in the same storage bucket are all identified by the data partition name of the data partition mentioned therein; The index objects in the same storage bucket are all identified by the index partition name of the index partition mentioned therein; The fence objects in the same storage bucket are all identified by the fence partition name of the fence partition. The method according to any one of claims 1 to 3, characterized in that, The naming format of the data partition includes: a prefix, the volume identifier corresponding to the storage bucket, and the data partition identifier; And / or, The naming format of the index partition includes: a prefix, the volume identifier corresponding to the storage bucket, and the index partition identifier; And / or, The naming format of the fence partition includes: a prefix, the volume identifier corresponding to the storage bucket, and the index partition identifier. The method according to any one of claims 1 to 4, characterized in that The standard format object includes: a data object; The process of processing the data to be backed up in the block storage system according to a preset data organization method to obtain a standard format object includes: Determine the data to be backed up in the target index address range of the target logical volume in the block storage system; and According to the content format of the data objects, data objects are generated for the data corresponding to each data address segment in the data to be backed up, and a data object name is generated for each data object according to the naming format of the data objects. The method according to claim 5, characterized in that, The data object is obtained by encapsulating data in address order from a data address segment within a logical volume, and the content format of the data object includes at least one of the following: A complete data object, for which, during a full backup, all data written to a data address segment within the logical volume before the backup time is encapsulated into an independent object; Incremental data object: During incremental backup, data newly written to a data address segment in the logical volume before the backup time and after the last arbitrary backup is encapsulated into an independent object. Differential data object: During differential backup, data newly written to a data address range within the logical volume before the backup time and after the most recent full backup is encapsulated into an independent object. The method according to claim 5, characterized in that, The step of backing up the standard format object to a cloud object storage system includes: The first PUT request is transmitted through the PUT interface of the cloud object storage system to upload the generated data object to the target storage bucket corresponding to the target logical volume; The request body of the first PUT request includes a data object, and the request header of the first PUT request includes first indication information, which indicates at least one of the following: the target storage bucket, the data object name, the compression algorithm, and the encryption algorithm. The method according to claim 5, characterized in that, The standard format object also includes: an index object; The process of processing the target data to be backed up in the block storage system according to a preset data organization method to obtain a standard format object includes: If all data address segments within the target index address segment have generated data objects and have been backed up to the cloud object storage system, an index object is generated for the target index address segment according to the content format of the index object; and an index object name is generated according to the naming format of the index object. The method according to claim 8, characterized in that, The content format of the index object includes: a header metadata area and an index area; The header metadata area includes the metadata of the index file, and the index area includes the position index of the data in each data object in the index address segment. The position index of each data object is stored in the index area in address order. The method according to claim 9, characterized in that, The standard format object also includes: a fence object; The step of organizing the target data to be backed up in the block storage system according to a preset data organization method to obtain a standard format object includes: If all index address segments in the target logical volume have been generated into index objects and backed up to the cloud object storage system, generate a fence object according to the content format requirements of the fence object, and generate a fence object name according to the naming format of the fence object. The fence object is used to indicate that a backup of the target logical volume has been completed. A data recovery method, characterized in that, The method includes: Download the standard format object that has been backed up to the cloud object storage system; The standard format object is processed according to a preset data parsing method to obtain the recovered data; and The recovered data is stored in a block storage system; The standard format object includes at least one of the following: a data object, an index object, and a fence object. The method according to claim 11, characterized in that, The recovered data includes a portion of the data in the target logical volume, and the standard format object includes an index object; The downloaded standard format objects that have been backed up to the cloud object storage system include: Based on the starting address and address length of the data to be recovered in the target logical volume, determine the target index address segment where the data to be recovered is located; Download the list of index object names corresponding to the target index address range; The index object names in the index object name list are parsed according to the index object naming format to determine the version number in the index object name; and Download the corresponding index object based on the version number in the index object name. The method according to claim 12, characterized in that, The download of the list of index object names corresponding to the target index address range includes: The first LIST request is transmitted through the LIST interface of the cloud object storage system to download the list of index object names corresponding to the target index address range; The request header of the first LIST request includes first indication information, which indicates at least one of the following: the target bucket corresponding to the target logical volume, the index partition name as the prefix of the index object, the starting position of the LIST, and the maximum number of objects that can be returned in one LIST. The method according to claim 12, characterized in that, The downloading of the corresponding index object based on the version number in the index object name includes: The index object corresponding to the index object name is downloaded from the target storage bucket through the GET interface of the cloud object storage system. The method according to claim 11, characterized in that, The recovered data includes all data in the target logical volume; the standard format object includes a fence object; The downloaded standard format objects that have been backed up to the cloud object storage system include: A second LIST request is transmitted through the LIST interface of the cloud object storage system to download the list of fence object names that have been backed up to the cloud object storage system. The request header of the second LIST request includes second indication information, which indicates at least one of the following: the target bucket corresponding to the target logical volume, the fence partition name as a prefix of the fence object, the starting position of the LIST, and the maximum number of objects that can be returned in one LIST. The method according to any one of claims 12 to 15, characterized in that, The naming format for the index object includes: The index partition name, the starting address of the index address range corresponding to the index object in the logical volume, and the version number of the index object. The method according to any one of claims 12 to 16, characterized in that, The step of processing the standard format object according to a preset data parsing method to obtain the recovered data includes: The downloaded index object is parsed according to its content format to obtain the data index in the index object, which includes a list of data object names. The method according to any one of claims 12 to 17, characterized in that, The standard format object also includes: a data object, wherein the standard format object downloaded and backed up to the cloud object storage system includes: Based on the data object name in the data index, the data object corresponding to the data object name is downloaded from the target storage bucket through the GET interface of the cloud object storage system. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 1 to 18. A computer program product, comprising a computer program, characterized in that, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 1 to 18.