Method, device, storage medium, electronic device and program product for processing internal objects

By managing metadata in memory and updating target metadata when business objects change, the problem of low efficiency in internal object reclamation processing in distributed storage systems is solved, achieving more efficient storage space management and performance improvement.

CN120045129BActive Publication Date: 2026-06-12INSPUR SUZHOU INTELLIGENT TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
INSPUR SUZHOU INTELLIGENT TECH CO LTD
Filing Date
2024-12-31
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

In existing technologies, distributed storage systems are inefficient in handling the recycling of internal objects, especially during reverse metadata updates, resulting in poor system performance.

Method used

By pre-storing metadata in memory and determining and modifying initial metadata to generate target metadata when business objects change, and determining the recycling strategy for historical internal objects based on the target metadata, metadata is updated directly in memory instead of on disk, reducing I/O operations.

🎯Benefits of technology

It improves the efficiency of internal object recycling, reduces hard disk access, enhances system response speed and performance, and optimizes storage space utilization.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN120045129B_ABST
    Figure CN120045129B_ABST
Patent Text Reader

Abstract

Embodiments of the present application provide a processing method and device of an internal object, a storage medium, an electronic device and a program product, wherein the method comprises: in response to an initial service object changing into a target service object, determining initial metadata corresponding to the initial service object from at least one first metadata stored in a memory, wherein the first metadata is used to at least represent a storage state of data in a historical internal object associated with the initial service object, and the historical internal object is used to store service data of the initial service object; changing the initial metadata to obtain target metadata; and determining a recycling strategy of the historical internal object based on the target metadata, wherein the recycling strategy is used to represent a rule of whether to perform recycling processing on the historical internal object. Through the present application, the technical problem of low efficiency of recycling processing on the internal object is solved, and the technical effect of improving the efficiency of recycling the internal object is achieved.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of computers, and more specifically, to a method, apparatus, storage medium, electronic device, and program product for processing internal objects. Background Technology

[0002] Currently, distributed storage systems play a core role in the fields of big data and cloud computing, and their high performance, high reliability, and easy scalability are widely favored.

[0003] In related technologies, metadata management has become the key to optimizing system performance and resource utilization, especially when dealing with small input / output (I / O) scenarios. However, for updating reverse metadata, the metadata of the internal object is usually read first and then modified. Therefore, the above method has the technical problem of low efficiency in recycling internal objects. Summary of the Invention

[0004] This application provides a method, apparatus, storage medium, electronic device, and program product for processing internal objects, to at least solve the problem of low efficiency in recycling internal objects in related technologies.

[0005] According to one embodiment of this application, a method for processing internal objects is provided. The method may include: in response to an initial business object changing to a target business object, determining initial metadata corresponding to the initial business object from at least one first metadata stored in memory, wherein the first metadata is used to at least characterize the storage state of data in a historical internal object associated with the initial business object, and the historical internal object is used to store the business data of the initial business object; modifying the initial metadata to obtain target metadata; and determining a recycling strategy for the historical internal object based on the target metadata, wherein the recycling strategy is used to represent the rules for whether to recycle the historical internal object.

[0006] In one exemplary embodiment, modifying the initial metadata to obtain target metadata includes: determining the data volume corresponding to the business data; and modifying the effective data length information in the initial metadata according to the data volume, wherein the effective data length information is used to characterize the data storage volume in the historical internal object.

[0007] In one exemplary embodiment, determining a recycling strategy for historical internal objects based on target metadata includes: in response to a valid data length information being less than a first target value, determining the recycling strategy as recycling the historical internal objects.

[0008] In one exemplary embodiment, determining a recycling strategy for historical internal objects based on target metadata includes: in response to a valid data length information being greater than or equal to a first target value, determining that the recycling strategy is no longer required to recycle the historical internal objects.

[0009] In an exemplary embodiment, in response to an initial business object changing into a target business object, determining initial metadata corresponding to the initial business object from at least one first metadata stored in memory includes: in response to the initial business object changing into a target business object, determining a reverse relationship corresponding to the initial business object, wherein the reverse relationship is used to characterize the mapping relationship between historical internal objects and the initial business object; determining the historical internal object mapped by the initial business object based on the reverse relationship; determining the identity information of the historical internal object; and determining the first metadata containing the identity information from at least one first metadata as the initial metadata.

[0010] In one exemplary embodiment, determining at least one first metadata that contains identity information as initial metadata includes: searching a database according to the identity information to obtain the initial metadata, wherein the database is used to store the first metadata.

[0011] In one exemplary embodiment, the method may further include: loading initial metadata into memory in response to the need to reclaim historical internal objects or a change in the data storage status of historical internal objects when a memory restart occurs.

[0012] In an exemplary embodiment, the method may further include: in response to a data storage instruction issued by a client, acquiring multiple data to be stored corresponding to the data storage instruction; aggregating the multiple data to be stored to obtain a first internal object; constructing second metadata of the first internal object based on the data volume of the data to be stored, wherein the second metadata is used at least to characterize the data storage status of the first internal object; constructing a mapping relationship between the data to be stored and the first internal object to obtain a first positive relationship; and storing the first positive relationship and the second metadata in memory.

[0013] In an exemplary embodiment, the method may further include: obtaining a data acquisition request issued by a client, wherein the data acquisition request is used to acquire data to be acquired stored in a data pool; determining a first business object corresponding to the data acquisition request; acquiring a first forward relationship corresponding to the first business object from memory; determining the identity information of a first internal object mapped to the target business object based on the first forward relationship; and acquiring the data to be acquired from the first internal object deployed in the data pool according to the identity information.

[0014] In one exemplary embodiment, the method may further include: transmitting the data to be acquired to the client, and modifying the effective data length information of the third metadata corresponding to the first internal object in response to acquiring the data to be acquired from the first internal object, wherein the third metadata is stored in memory.

[0015] In an exemplary embodiment, the method may further include: in response to a change in business data, aggregating the changed business data to obtain a second internal object, wherein the second internal object is different from the historical internal object; constructing fourth metadata of the second internal object based on the data volume of the changed business data, wherein the fourth metadata is used at least to characterize the data storage status of the second internal object; constructing a mapping relationship between the changed business data and the second internal object to obtain a second positive relationship; and storing the second positive relationship and the fourth metadata in memory.

[0016] According to another embodiment of this application, an internal object processing apparatus is also provided. The apparatus may include: a first determining unit, configured to determine initial metadata corresponding to the initial business object from at least one first metadata stored in memory in response to an initial business object changing to a target business object, wherein the first metadata is used to at least characterize the data storage of historical internal objects associated with the initial business object, and the historical internal objects are used to store business data in the initial business object; a modifying unit, configured to modify the initial metadata to obtain target metadata; and a second determining unit, configured to determine a recycling strategy for historical internal objects based on the target metadata, wherein the recycling strategy is used to represent the rules for whether to recycle historical internal objects.

[0017] According to yet another embodiment of this application, a computer-readable storage medium is also provided, in which a computer program is stored, wherein the computer program is configured to perform the steps in any of the above method embodiments when it is run.

[0018] According to yet another embodiment of this application, an electronic device is also provided, including a memory and a processor, wherein a computer program is stored in the memory and the processor is configured to run the computer program to perform the steps in any of the above method embodiments.

[0019] According to yet another embodiment of this application, a computer program product is also provided, including a computer program that, when executed by a processor, implements the steps in any of the above method embodiments.

[0020] This application addresses the issue of an initial business object changing into a target business object. It determines initial metadata corresponding to the initial business object from at least one set of first metadata stored in memory. The first metadata characterizes the storage state of data in historical internal objects associated with the initial business object, which store the business data of the initial business object. The initial metadata is then modified to obtain target metadata. Based on the target metadata, a recycling strategy for the historical internal objects is determined, where the recycling strategy represents the rules for whether to recycle the historical internal objects. In other words, in this embodiment, first metadata is pre-stored in memory. This first metadata can be used to determine the data storage status of historical internal objects. Therefore, when the business data of the initial business object changes, the initial business object becomes the target business object. The initial metadata corresponding to the initial business object can then be determined and adjusted to obtain the target cloud data. Furthermore, based on the target metadata, it can be determined whether to recycle the historical internal objects, thereby solving the technical problem of low efficiency in recycling internal objects and achieving the technical effect of improving the efficiency of recycling internal objects. Attached Figure Description

[0021] Figure 1 This is a hardware structure block diagram of a server device for a method of processing internal objects according to an embodiment of this application;

[0022] Figure 2 This is a flowchart of a method for processing internal objects according to an embodiment of this application;

[0023] Figure 3 This is a schematic diagram illustrating the correspondence between business objects and internal objects according to embodiments of this application;

[0024] Figure 4 This is a flowchart of an internal object recycling method according to an embodiment of this application;

[0025] Figure 5 This is a flowchart of the brushing method for internal objects according to an embodiment of this application;

[0026] Figure 6 This is an example diagram of a write operation in a distributed storage system according to an embodiment of this application;

[0027] Figure 7 This is a structural block diagram of an internal object processing apparatus according to an embodiment of this application;

[0028] Figure 8 This is a computer system architecture block diagram of an electronic device according to an embodiment of this application. Detailed Implementation

[0029] The embodiments of this application will be described in detail below with reference to the accompanying drawings and examples.

[0030] It should be noted that the terms "first," "second," etc., in the specification, claims, and drawings of this application are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence.

[0031] This embodiment also provides a method for processing internal objects. This system is used to implement the embodiments and preferred embodiments, and will not be repeated hereafter. As used below, the terms "module" and "unit" refer to a combination of software and / or hardware that can perform a predetermined function. Although the apparatus described in the following embodiments is preferably implemented in software, hardware implementation, or a combination of software and hardware, is also possible and contemplated.

[0032] As an optional implementation, the method embodiments provided in this application can be executed on a server device or a similar computing device. Taking running on a server device as an example, Figure 1 This is a hardware structure block diagram of a server device for processing an internal object according to an embodiment of this application. Figure 1 As shown, the server device may include one or more ( Figure 1 Only one is shown in the diagram. A processor 102 (which may include, but is not limited to, a microprocessor MCU or a programmable logic device FPGA, etc.) and a memory 104 for storing data are also shown. The server device may further include a transmission device 106 for communication functions and an input / output device 108. Those skilled in the art will understand that... Figure 1 The structure shown is for illustrative purposes only and does not limit the structure of the server equipment described above. For example, the server equipment may also include components that are more... Figure 1 The more or fewer components shown, or having the same Figure 1 The different configurations shown.

[0033] The memory 104 can be used to store computer programs, such as application software programs and modules, like the computer program corresponding to the internal object processing method in this embodiment. The processor 102 executes various functional applications and data processing by running the computer program stored in the memory 104, thus implementing the above-described method. The memory 104 may include high-speed random access memory and non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some instances, the memory 104 may further include memory remotely located relative to the processor 102, and these remote memories can be connected to server devices via a network. Examples of such networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.

[0034] The transmission device 106 is used to receive or send data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider for the server device. In one example, the transmission device 106 includes a Network Interface Controller (NIC), which can connect to other network devices via a base station to communicate with the Internet. In another example, the transmission device 106 may be a Radio Frequency (RF) module used for wireless communication with the Internet.

[0035] This embodiment provides a method for processing internal objects. Figure 2 This is a flowchart of a method for processing internal objects according to an embodiment of this application, such as... Figure 2 As shown, the process includes the following steps:

[0036] Step S202: In response to the initial business object changing to the target business object, determine the initial metadata corresponding to the initial business object from at least one first metadata stored in memory, wherein the first metadata is used to at least characterize the storage state of data in the historical internal object associated with the initial business object, and the historical internal object is used to store the business data of the initial business object.

[0037] Step S204: Modify the initial metadata to obtain the target metadata;

[0038] Step S206: Based on the target metadata, determine the recycling strategy for historical internal objects, wherein the recycling strategy is used to represent the rules for whether to recycle historical internal objects.

[0039] In this embodiment, the initial business object can be data written by the client, a data entity created, read, written, and managed by a user or application in a distributed storage system, a data unit, and can contain any type of business data, such as files, images, or database records. The business data can be business object data, such as documents, images, or video files uploaded by the user. It should be noted that this is merely an example, and no specific limitations are placed on the type of business data described above.

[0040] Optionally, the aforementioned first metadata can be newly added metadata in this embodiment, a newly added memory data structure, which can be metadata information of internal objects. This first metadata can be used to characterize the state of the corresponding internal object. The aforementioned internal object can be a data unit stored in the data pool, which can be used to store business data. The aforementioned internal object can be a data structure or storage unit introduced in a distributed storage system to optimize data storage and operation.

[0041] Optionally, when processing write operations on business objects, the system aggregates the written data, converting multiple small, random data write operations into sequential write operations on internal objects. Internal objects typically have a fixed size, such as 4 megabytes (MB), which can be used to store the aggregated data segments.

[0042] Optionally, the aforementioned internal object can be the physical storage carrier for business object data, and it has forward and reverse metadata relationships with the business object. The forward relationship points from the business object to the internal object for data retrieval; the reverse relationship points from the internal object to the business object for garbage collection and storage space management. When the business object data changes, the system creates a new internal object to store the updated data, and simultaneously updates the forward and reverse metadata to maintain data integrity and ensure efficient system operation.

[0043] Optionally, during garbage collection, the effective data length information of internal objects can be used to determine which internal objects are no longer referenced by any business objects. These internal objects are considered garbage data, and the storage space they occupy can be reclaimed for reuse.

[0044] In this embodiment, to manage internal objects, a distributed storage-based reverse metadata management method is designed based on the characteristics of distributed storage. This method avoids reading the old internal object metadata every time a business object is modified by adding a set of metadata. The aforementioned metadata can be first metadata, which can be stored in memory to manage internal objects.

[0045] Optionally, each storage unit (PG) stores its own internal object information, that is, the first metadata corresponding to the internal object. Each internal object can use a 64-bit unsigned integer and a 32-bit unsigned integer to record information, where the 64-bit unsigned integer and the 32-bit unsigned integer record the information to form the first metadata. The 64-bit unsigned integer records the internal object's identification (ID), and the 32-bit unsigned integer records the effective data length of the internal object. This information is resident in memory and persisted to disk. Each time a business object is written, this information is persisted to disk along with the business object's metadata, thus eliminating the need to read the original internal object's metadata. When an internal object ages out, this metadata information needs to be updated synchronously.

[0046] New metadata (i.e., primary metadata) can be organized as follows:

[0047] bool is_loaded = false;

[0048] unorder_map <uint32_t, unordered_map<uint64_t, uint32_t> > agg_shards;

[0049] The `is_loaded` parameter indicates whether this set of metadata has been loaded into memory. The default value is `false`, which means that it has not been loaded into memory. This set of metadata can be loaded into memory during garbage collection or batch submission of metadata, and `is_loaded` will be set to `true`.

[0050] In Agg_item, the first 32-bit key can be used to represent the sequence number of the shard (e.g., a shard), the second 64-bit key represents the ID of the internal object, and the third 32-bit value represents the length of the valid data of the internal object.

[0051] Optionally, taking a data pool with an effective capacity of 27.075 billion bytes (Terabytes, or TB) as an example, with each internal object being 4MB in size, there are a total of 70,975,488 internal objects in the cluster. The cache pool in the cluster has 4,096 Group Caches (PGs), with each PG containing 17,328 internal objects. Each shard stores the metadata information (i.e., the first metadata) of 16 internal objects, and the space occupied by a single shard is 192 bytes. There are a total of 1,084 shards in each PG. In an 8KB random write scenario, under extreme conditions, a single flush of 4MB of business object data corresponds to 512 internal objects and 512 shards. Therefore, the amount of metadata that needs to be additionally committed to the key-value pair database (kv) for a single flush is 96KB.

[0052] Optionally, the aforementioned first metadata can be stored in a key-value pair database as key-value pairs.

[0053] Optionally, in response to a change in the initial business object, that is, when the initial business object changes to the target business object, at least one first metadata can be obtained from the storage unit in memory. From the at least one first metadata, the initial metadata corresponding to the initial business object can be determined. This initial metadata can be used to characterize the storage state of data in the historical internal object.

[0054] Optionally, in this embodiment, the business object is associated with a specific physical location (i.e., an internal object) in the storage system through the first metadata, thereby enabling efficient access and management of data.

[0055] Optionally, in response to a modification of a business object—that is, when the initial business object changes to the target business object—the system determines the valid data length of the internal object (historical internal object) corresponding to the initial business object from the first metadata stored in memory (e.g., agg_shards). This step is crucial for updating the forward and reverse metadata relationships, ensuring the correct creation of the new internal object and the updating of the valid data length of the old internal object.

[0056] For example, in response to the initial business object changing to the target business object, the internal objects corresponding to the target business object will be reconstructed, and it will be determined whether to reclaim the historical internal objects storing the initial business object. In response to the initial business object changing to the target business object, the system can first read the `agg_shards` data structure in memory (i.e., the initial metadata), which stores the effective data length information of all internal objects. Based on the positive metadata information of the initial business object, the associated historical internal objects can be located. Furthermore, in response to the change of the initial business object, the effective data length information in the initial metadata can be modified to obtain the target metadata. Based on the target metadata, the effective data length stored in the historical internal objects can be determined. If some data is overwritten by the new target business object, the effective length of the overwritten data becomes 0, indicating that this data can be garbage collected.

[0057] For another example, suppose the initial business object A contains data segments D1 and D2, stored in internal objects agg1 and agg2 respectively. Now, business object A is modified; D1 is replaced by new data D1', while D2 remains unchanged, forming the target business object A'. From at least one primary metadata source, the initial metadata corresponding to the changed initial business data is determined. In agg_shards, the following records exist:

[0058] cpp

[0059] agg_shards = {

[0060] {1, { {agg1, ​​4096}, {agg2, 2048}}}

[0061] };

[0062] Here, agg1 and agg2 can represent the IDs of internal objects, and 4096 and 2048 can represent their respective valid data lengths.

[0063] When writing to the target business object A', the system will check the valid data length of agg1 and update it to 0 (assuming D1' completely overwrites D1). At the same time, a new internal object agg1' will be created for D1', and the information in agg_shards will be updated to obtain the target metadata.

[0064] cpp

[0065] agg_shards = {

[0066] {1, { {agg1', 4096}, {agg2, 2048}}}

[0067] };

[0068] Furthermore, by analyzing the target metadata, it can be determined whether historical internal objects need to be reclaimed. It should be noted that the numbers mentioned above are for illustrative purposes only, and the English terms used for the parameters are also for illustrative purposes; no specific limitations are imposed on the content described here.

[0069] In this embodiment, the initial metadata update process is completed entirely in memory, without the need to read the metadata of internal objects from the hard disk, thus saving input / output (I / O) operations.

[0070] For example, if the effective data length of agg1 becomes 0, it indicates that agg1 is now garbage data and can be reclaimed. A new internal object agg1' is used to store the updated data D1', and its effective data length is recorded in agg_shards.

[0071] Optionally, when the initial business object changes to become the target business object, the system can first read the internal object metadata information (initial metadata) related to the initial business object from memory. This initial metadata can be stored in agg_shards and can reflect the storage status of data in historical internal objects, i.e., the effective data length.

[0072] Optionally, the system analyzes the newly written data and updates the valid data length of the relevant internal objects in agg_shards. If the new data completely overwrites the old data, the valid data length of the old internal object will be set to 0, indicating that all data of that internal object is no longer valid, thus allowing the obtained target cloud data to be obtained.

[0073] Optionally, after obtaining the target metadata, the system can determine a garbage collection strategy based on internal objects with a valid data length of 0 in the target metadata. This strategy typically means that if the valid data length of an internal object is 0, then the internal object can be marked as reclaimable, thereby releasing the storage space it occupies. Here, the aforementioned reclamation strategy can be a garbage collection strategy, which can be used to determine whether to reclaim historical internal objects.

[0074] For example, suppose the initial business object B contains two data segments, Data1 and Data2, stored in internal objects agg1 and agg2 respectively. The effective data lengths of agg1 and agg2 in agg_shards are 4096 and 2048 bytes respectively. In response to the initial business object changing to the target business object, the effective data length information of the internal objects agg1 and agg2 associated with B can be read from agg_shards. When B changes, Data1 is completely overwritten by the new Data1', while Data2 remains unchanged. The effective data length of agg1 in agg_shards can be updated to 0, and a new internal object agg3 can be created for Data1', updating its effective data length to 4096. At this point, the target metadata includes the effective data length of agg1 (0), agg2 (2048), and the new object agg3 (4096). Checking the target metadata for agg1 reveals that its effective data length is 0, thus confirming agg1 as a recyclable internal object. The effective data length of agg2 is not 0, indicating that business objects still reference its data, therefore it will not be reclaimed. - For agg3, since it is newly created, it is irrelevant to the recycling policy.

[0075] This invention is a method for reverse metadata management in distributed storage. This method reduces the I / O path for metadata updates during the flushing process and garbage collection, reduces access to hard disk data, reduces traversal of target data, and directly accesses the target data in memory before writing to disk after updating. This can save CPU resources and hard disk I / O resources, thereby improving I / O performance.

[0076] Optionally, through the above steps, the system not only efficiently handles updates to business objects, but also determines a garbage collection strategy based on the updated metadata (target metadata), thereby quickly identifying and reclaiming internal objects that are no longer referenced, optimizing storage space usage. This method significantly reduces disk read / write operations, improving system response speed and performance.

[0077] Using the above method, when the business data of the initial business object changes, the initial business object will change into the target business object. Then, the initial metadata corresponding to the initial business object can be determined, and the initial metadata can be adjusted to obtain the target cloud data. Furthermore, based on the target metadata, it can be determined whether to recycle the historical internal objects, thereby solving the technical problem of low efficiency in recycling internal objects and achieving the technical effect of improving the efficiency of recycling internal objects.

[0078] In one exemplary embodiment, modifying the initial metadata to obtain target metadata includes: determining the data volume corresponding to the business data; and modifying the effective data length information in the initial metadata according to the data volume, wherein the effective data length information is used to characterize the data storage volume in the historical internal object.

[0079] In this embodiment, the amount of business data to be changed can be determined. Based on this amount of data, the effective length information in the initial metadata can be changed to obtain the target metadata. At this time, the modified target metadata can be used to characterize the amount of data stored in the historical internal objects. Based on the amount of data, it can be determined whether the historical internal objects should be recycled.

[0080] Optionally, when the initial business object changes to the target business object, the system needs to modify the metadata stored in memory (initial metadata) to reflect the updated business data. The effective data length information in the initial metadata is crucial, indicating the actual space occupied by the data within the internal object. The process of modifying the metadata involves determining the amount of new data and adjusting the effective data length information of the internal object accordingly. This process ensures that the system's garbage collection mechanism can accurately identify which data can be reclaimed and optimizes the management of storage resources.

[0081] Optionally, the actual size of the newly written business data, Data_new, is first calculated. This step ensures that metadata updates accurately reflect changes in data volume. Based on this data volume, the internal object associated with Data_new can be located, and then the internal object and its valid data length information in agg_shards (i.e., initial metadata) are updated. If Data_new completely overwrites the old data, the valid data length of the old internal object will be set to 0; if Data_new is an incremental update (i.e., only a portion of the old data is replaced), the valid data length of the corresponding internal object will be adjusted according to the data volume of Data_new.

[0082] For example, suppose the initial business object B contains two data segments, Data1 (4096 bytes) and Data2 (2048 bytes), stored in internal objects agg1 and agg2 respectively. The initial valid data lengths of agg1 and agg2 in agg_shards are 4096 and 2048 bytes respectively. It can be determined that the amount of data B corresponding to the business data changes; Data1 is partially overwritten by the new Data1' (2048 bytes), while Data2 remains unchanged. Changing the valid data length information in the initial metadata determines that the data size of Data1' is 2048 bytes. Because Data1' only replaces a portion of the data in Data1, the valid data length of agg1 needs to be reduced from 4096 to 2048 bytes to reflect the actual data usage of Data1'. The valid data length of agg2 remains unchanged at 2048 bytes.

[0083] Furthermore, the system could also consider creating a new internal object `agg3` to store `Data1'` after it is written, and setting the effective length of the overwritten data segment in `agg1` to 0, indicating that this data is now considered garbage. Simultaneously, the effective data length information for `agg3` in `agg_shards` would be updated to 2048 bytes.

[0084] Figure 3 This is a schematic diagram illustrating the correspondence between business objects and internal objects according to embodiments of this application, such as... Figure 3 As shown, the data fragments contained in the business objects (such as object 301, object 302, and object 303) can be aggregated into internal objects (such as 4M object 304) in the system. These internal objects can be stored in the data pool, and the relationship between the business objects and the internal objects is recorded through metadata to form positive and negative relationship chains.

[0085] Optionally, such as Figure 3 As shown, the data fragments of each business object exist in both the cache pool 305 and the data pool 306, demonstrating the dual-pool storage characteristic of the data. Specifically, business data is written from the client to the cache pool 305, aggregated to form a 4M object 304, and then flushed to the data pool 306, ultimately achieving persistent data storage.

[0086] In summary, through the above implementation methods, the system can efficiently manage the effective data length information of internal objects, thereby enabling rapid adjustment of the storage state when business objects are updated. The process of modifying metadata occurs in memory, avoiding frequent read / write operations to the hard drive and improving storage system performance. Simultaneously, timely updated effective data length information provides accurate data for subsequent garbage collection, helping to optimize storage space utilization.

[0087] In one exemplary embodiment, determining a recycling strategy for historical internal objects based on target metadata includes: in response to a valid data length information being less than a first target value, determining the recycling strategy as recycling the historical internal objects.

[0088] In one exemplary embodiment, determining a recycling strategy for historical internal objects based on target metadata includes: in response to a valid data length information being greater than or equal to a first target value, determining that the recycling strategy is no longer required to recycle the historical internal objects.

[0089] In this embodiment, the first target value can be a preset value, such as 0. The effective data length information can be used to characterize the amount of historical internal object data.

[0090] Optionally, in a distributed storage system, when business data (e.g., initial business data or target data) is updated, the system modifies the initial metadata corresponding to the business data to generate target metadata. This target metadata may contain the latest status of the effective data length information of internal objects. Therefore, based on the target metadata, the system can determine whether to reclaim historical internal objects, and whether to reclaim them depends on the comparison result between the effective data length information and a preset first target value.

[0091] Optionally, in response to the effective data length information being less than a first target value, the reclamation strategy is determined to reclaim historical internal objects. The first target value can be 0, indicating that the internal object contains no valid data. When the effective data length information in the target metadata is equal to or less than this first target value, the system determines that the internal object is obsolete or no longer used, thus determining the reclamation strategy to reclaim this historical internal object. This ensures effective management of storage space and avoids useless data consuming storage resources.

[0092] Optionally, in response to the effective data length information being greater than or equal to a first target value, the reclamation strategy can be determined as not requiring the reclamation of historical internal objects. That is, when the effective data length information of an internal object in the target metadata is greater than or equal to the first target value, it indicates that the internal object still stores valid data and may still be referenced by one or more business objects. In this case, the system determines the reclamation strategy as not reclamating the historical internal object to maintain data integrity and business operation continuity.

[0093] For example, suppose the internal object `agg1` initially stores 4MB of data for business object `X`, with a valid data length of 4MB. Later, business object `X` is modified, and the new data completely overwrites the original data in `agg1`, resulting in a data size of 2MB. After the write operation, the valid data length of `agg1` is updated to 2MB in the target metadata. When checking the target metadata of `agg1` and finding its valid data length to be 2MB, the system responds that this information is greater than or equal to the first target value of 0, determining that `agg1` does not need to be reclaimed. However, if the write operation for `X` again overwrites all data in `agg1`, and its valid data length becomes 0MB, then the system responds that this information is less than the first target value of 0, determining that the reclamation strategy is to reclaim `agg1`.

[0094] In this embodiment, for internal objects that are shared or partially covered by multiple business objects, the system tracks and updates the effective data length information more meticulously. For example, if the original data of internal object agg2 is 4MB, of which 2MB is covered by business object X, and the remaining 2MB is still used by other business objects, then the effective data length in the updated target metadata of agg2 might be 2MB. In this case, the system determines that the reclamation strategy is not to reclaim agg2, but to retain its remaining effective data portion until all business objects referencing the internal object have completed their data updates, and the effective data length information in agg2 has decreased to 0.

[0095] In summary, the recycling strategy decision based on the effective data length information in the target metadata not only improves the management efficiency of storage space and system performance, but also simplifies the logical processing of garbage collection. It is a key strategy for efficiently handling data changes and managing storage space in distributed storage systems.

[0096] In an exemplary embodiment, in response to an initial business object changing into a target business object, determining initial metadata corresponding to the initial business object from at least one first metadata stored in memory includes: in response to the initial business object changing into a target business object, determining a reverse relationship corresponding to the initial business object, wherein the reverse relationship is used to characterize the mapping relationship between historical internal objects and the initial business object; determining the historical internal object mapped by the initial business object based on the reverse relationship; determining the identity information of the historical internal object; and determining the first metadata containing the identity information from at least one first metadata as the initial metadata.

[0097] In this embodiment, the aforementioned reverse relationship can be used to reclaim internal objects to avoid wasting space. During business operations, business objects are modified and written. Each modified and written data is aggregated into a new internal object, while the old internal object data is discarded as old data, generating garbage data. The storage space occupied by this garbage data in the data pool needs to be reclaimed.

[0098] When modifying business objects, in addition to writing data, it's also necessary to update their forward and reverse relationships simultaneously. Updating the reverse relationship involves two parts: first, updating the valid data length in the old internal object; and second, inserting the reverse relationship into the new internal object. Updating the valid data length in the old internal object requires first reading the metadata of that internal object, modifying it, and then writing it back. This method of updating the reverse relationship of internal objects, requiring a read-then-write approach, has a significant impact on performance.

[0099] In this embodiment, if the initial business object is transformed into the target business object, the reverse relationship corresponding to the initial business object can be determined. Based on the reverse relationship, the historical internal object mapped by the initial business object can be determined. Based on the identity information corresponding to the historical internal object, the initial metadata in at least one first metadata can be determined.

[0100] Optionally, in a distributed storage system, when a business object is updated, the system needs to determine which internal objects are associated with that business object in order to update its initial metadata. This process involves finding and analyzing reverse relationships, determining the identity information of internal objects, and ultimately filtering out the metadata related to the changes in the business object from the metadata set in memory.

[0101] Optionally, when a business object (the initial business object) changes, the system first looks up the reverse metadata relationship of that business object, which is typically stored in `agg_shards`. The reverse relationship indicates which internal objects store the data of the business object, and the specific location of this data within the internal objects. By analyzing the reverse relationship, the system can identify all historical internal objects related to the initial business object. This step ensures that the system can find all internal object metadata that may need to be updated. Each internal object has a set of identity information, such as an internal object ID and its storage location. The system needs to determine this information so that it can accurately locate the correct internal object when updating metadata later. Finally, the system can filter out metadata containing the identity information of historical internal objects from all the initial metadata stored in memory. This metadata is the initial metadata directly related to the changes in the business object. After determining the initial metadata, the system can further update its effective data length information to reflect the actual state of the business object data.

[0102] For example, suppose the initial business object A is stored in internal objects agg_A1, agg_A2, and agg_A3. After an update operation, some data of A is overwritten by new data Data_new (2MB), and Data_new is aggregated into a new internal object agg_A4. At this point, it is necessary to determine which historical internal objects were modified and update their metadata. The initial cloud data corresponding to this initial business object can be modified through the following steps: Find the reverse relationships of A to determine the association between agg_A1, agg_A2, and agg_A3 and A. Based on the reverse relationships, confirm the mapping relationship between A and agg_A1, agg_A2, and agg_A3, identifying these historical internal objects. Determine the identity information of the historical internal objects, such as agg_A1 having an ID of 0x1A2B3C4D and a storage location of / data / Pool1 / agg_A1. Furthermore, from the initial metadata stored in memory, metadata containing the identity information of agg_A1, agg_A2, and agg_A3 can be filtered out. This metadata is the initial metadata related to the change of A. For example, the initial metadata may record that the effective data length of agg_A1 is 4MB, the effective data length of agg_A2 is 2MB, and the effective data length of agg_A3 is 1MB.

[0103] In this embodiment, in a distributed storage environment, a business object may be associated with multiple internal objects, and these internal objects may be distributed across different storage nodes. Therefore, when determining the internal objects and metadata related to changes in the business object, the distribution and consistency of data in the distributed system need to be taken into account.

[0104] In summary, through the above steps, the distributed storage system can efficiently handle updates to business objects, ensure that the associated internal object metadata can be updated accurately and in a timely manner, and optimize the use of storage space and garbage collection mechanisms, thereby improving the overall performance and data management capabilities of the system.

[0105] As an optional embodiment, Figure 4 This is a flowchart of an internal object recycling method according to an embodiment of this application, such as... Figure 4 As shown, the method may include the following steps:

[0106] Step S402: Locate the internal objects that need garbage collection using agg_shards.

[0107] In this embodiment, the `agg_shards` data structure is used to store and manage metadata information for internal objects. When the system needs to perform garbage collection, it directly accesses the `agg_shards` in memory to find internal objects with a valid data length of zero. This is because when a business object is modified and written, the old internal object data is overwritten by the new internal object data, thus becoming garbage data. By checking the valid data length of internal objects in `agg_shards`, the system can quickly locate which internal objects no longer contain valid data, thereby determining the target internal objects that need to be garbage collected.

[0108] Step S404: Read the found internal objects and perform garbage collection.

[0109] In this embodiment, after identifying internal objects that require garbage collection, the system no longer needs to read the metadata information of these internal objects from the hard drive. This is because the metadata information is already resident in memory within `agg_shards`, containing key information such as the effective data length of the internal objects. By directly accessing this information in memory, the system can avoid hard drive read operations, reduce I / O burden, and improve garbage collection efficiency. Once it is confirmed that an internal object no longer contains valid data, the system will perform garbage collection, release the corresponding storage space, update the effective data length information of the internal objects in `agg_shards`, and synchronously write these updates to the key-value database to ensure persistent storage of metadata.

[0110] In one exemplary embodiment, determining at least one first metadata that contains identity information as initial metadata includes: searching a database according to the identity information to obtain the initial metadata, wherein the database is used to store the first metadata.

[0111] In this embodiment, each shard can be written to the kv database using the index as the key.

[0112] In one exemplary embodiment, the method may further include: loading initial metadata into memory in response to the need to reclaim historical internal objects or a change in the data storage status of historical internal objects when a memory restart occurs.

[0113] In this embodiment, after the service restarts, the effective data length information of the internal objects is loaded into memory during flushing or garbage collection, avoiding loading during service startup and thus improving recovery efficiency in fault scenarios. The effective data length of the internal objects in memory is written to the key-value database in shard format to reduce the amount of data written to disk each time.

[0114] In an exemplary embodiment, the method may further include: in response to a data storage instruction issued by a client, acquiring multiple data to be stored corresponding to the data storage instruction; aggregating the multiple data to be stored to obtain a first internal object; constructing second metadata of the first internal object based on the data volume of the data to be stored, wherein the second metadata is used at least to characterize the data storage status of the first internal object; constructing a mapping relationship between the data to be stored and the first internal object to obtain a first positive relationship; and storing the first positive relationship and the second metadata in memory.

[0115] In this embodiment, when the distributed storage system receives a data storage instruction from the client, it can perform a series of operations according to the instruction content to efficiently store data and maintain metadata relationships.

[0116] Optionally, after receiving instructions from the client, the instruction content can be parsed to extract multiple data blocks that need to be stored. These data blocks may come from different business objects, but they will be aggregated into the same internal object for storage. Each data block can contain multiple pieces of data to be stored, which can be business data. The acquired data blocks can be aggregated to create an internal object (the first internal object). The purpose of aggregation is to improve storage efficiency and data processing speed, especially for small I / O scenarios. Aggregation can transform multiple random small I / O operations into sequential large I / O operations, thereby increasing write bandwidth and reducing latency. Furthermore, based on the aggregated data volume, metadata information (the second metadata) of the first internal object can be constructed. The second metadata can at least contain information about the data storage of the internal object, such as the internal object ID, data size, and effective data length. It serves as the basis for subsequent data retrieval, garbage collection, and other operations. It should be noted that this is only an example, and there are no specific limitations on the content included in the second metadata.

[0117] Furthermore, a mapping relationship can be constructed between the data to be stored and the first internal object, resulting in a first forward relationship. In this embodiment, to ensure data retrievability, a mapping relationship (first forward relationship) from business objects to internal objects can be constructed. Thus, when business object data needs to be read, the system can quickly locate the corresponding internal object through the forward relationship. To improve data processing speed, the constructed first forward relationship and second metadata can be stored in memory. This allows subsequent read, update, and garbage collection operations to directly read metadata information from memory, avoiding frequent disk read / write operations and thereby improving system response speed and performance.

[0118] For example, suppose a client issues a storage instruction containing multiple small data blocks. These blocks originate from business objects Obj1, Obj2, and Obj3, with data sizes of 1KB, 2KB, and 3KB respectively. The 1KB data for Obj1, the 2KB data for Obj2, and the 3KB data for Obj3 can be parsed from the storage instruction. These data blocks are aggregated to create a first internal object, Agg_Obj, with a size of 4KB. Secondary metadata for Agg_Obj can be constructed, including its ID, data size of 4KB, and effective data length of 4KB. Furthermore, a mapping relationship can be established between Obj1, Obj2, Obj3, and Agg_Obj, resulting in a first positive relationship, such as Obj1->Agg_Obj (1KB), Obj2->Agg_Obj (2KB), and Obj3->Agg_Obj (3KB). The first positive relation and the second metadata of Agg_Obj can be stored in memory to facilitate subsequent metadata management and data retrieval operations.

[0119] Alternatively, in the actual operation of a distributed storage system, the processes of data aggregation and metadata construction may be more complex. For example, the system may need to determine data aggregation and the creation of internal objects based on factors such as data access patterns and storage space distribution. Furthermore, to ensure data consistency and reliability, the system may need to perform data verification and consistency checks before storing metadata in memory.

[0120] In summary, the processes of data aggregation and metadata construction help improve storage efficiency, especially for small I / O operations, significantly reducing disk read / write operations and increasing write speed. Storing metadata in memory accelerates data retrieval and updates, reduces reliance on disk resources, and improves overall system performance. By building positive relationships, the system can quickly locate the physical storage location of business object data, simplifying the data retrieval process and improving data access speed. In-memory metadata management mechanisms enable the system to respond to client read / write requests more quickly, improving system responsiveness and user experience. Efficient data aggregation and metadata management help avoid wasting storage space, optimize storage resource utilization, and reduce storage costs.

[0121] Optionally, through the above steps, the distributed storage system can efficiently process data storage instructions issued by the client, which not only improves the efficiency of storage operations, but also optimizes metadata management and data retrieval processes, enhances the system's responsiveness and user experience, while reducing storage costs and improving the utilization rate of storage resources.

[0122] In an exemplary embodiment, the above method may further include: obtaining a data acquisition request issued by a client, wherein the data acquisition request is used to acquire data to be acquired stored in a data pool; determining a first business object corresponding to the data acquisition request; acquiring a first forward relationship corresponding to the first business object from memory; determining the identity information of a first internal object mapped to the target business object based on the first forward relationship; and acquiring the data to be acquired from the first internal object deployed in the data pool according to the identity information.

[0123] In this embodiment, the memory may contain multiple forward relationships corresponding to business objects. These forward relationships may include a first forward relationship, which can be used to read data from the business objects.

[0124] For example, upon receiving a client request to read the range [offset1, length2] of a business object oid, the system retrieves the forward relationship of the object oid, obtains the internal object name agg_oid ​​and the range [offset2, length2] corresponding to the range, and then reads the data from the data pool based on the obtained forward relationship before returning it to the client.

[0125] Optionally, in a distributed storage system, a data retrieval request initiated by the client is the trigger point for the system to perform data read operations. When the system receives a data retrieval request, it executes a series of steps to efficiently retrieve and return the data required by the client. These steps make full use of the metadata information stored in memory, avoid unnecessary disk reads, and improve data read efficiency.

[0126] Optionally, the system can retrieve data retrieval requests from clients: Clients send data retrieval requests to the system, requesting specific data stored in the data pool. These requests can be parsed to extract key information, such as the identifier of the business object, the data offset, and its length. Based on the parsed request information, the system can determine the business object from which the client is requesting data. For example, the client might request data from the Obj1 business object, starting at offset1 and with a length of length1. Using the business object identifier, the system can quickly retrieve the forward relationship information related to that business object from memory. This forward relationship information records the storage location of the business object data within its internal objects, which is crucial for rapid data retrieval. By analyzing the forward relationship, the system can determine the identity information of the internal object storing the requested data, such as its internal object ID. This means the system has located the actual physical storage location of the data. Furthermore, based on the internal object ID, the system can directly read the corresponding data from the data pool and return it to the client. This process is highly efficient because it utilizes metadata information in memory.

[0127] For example, suppose a client sends a data retrieval request, requesting data from the business object ObjX, starting at offset1 and with a length of length1. The system receives the client's request and parses out information such as the ObjX identifier, offset1, and length1. The system determines that the client is requesting data from ObjX, and can then retrieve the forward relationship information of ObjX from memory, discovering that some data of ObjX is stored in the internal object agg_X1. Using this forward relationship information, the system can determine the actual storage location of the requested data from ObjX is agg_X1, and obtain the ID and other identity information of agg_X1. Further, based on the ID of agg_X1, the system reads the corresponding data from the data pool and returns this data to the client, fulfilling the client's data retrieval request. It should be noted that the above data processing method is only an example and is not intended to impose specific limitations.

[0128] Optionally, in practical applications, business objects may contain a large amount of data, which is distributed across multiple internal objects. Therefore, the system needs to support querying and integrating data from multiple internal objects to satisfy the client's complete data retrieval request. Furthermore, to improve efficiency, the system may pre-cache some frequently accessed data, allowing data for frequently accessed business objects to be retrieved directly from the cache, further reducing data pool read operations.

[0129] In summary, by utilizing the forward relationship information stored in memory, the system can quickly locate the data storage location, avoiding unnecessary hard disk reads and significantly improving data access speed. The use of forward relationships reduces hard disk I / O operations, simplifies the logical processing of data retrieval, and makes the data retrieval and integration process simpler and more efficient.

[0130] Optionally, by utilizing the positive relationship information stored in memory, the distributed storage system can respond to the client's data retrieval requests efficiently and accurately. This not only improves data access speed and reduces I / O burden, but also optimizes data retrieval logic, enhances data consistency, and improves user experience. It is one of the key mechanisms for efficient data management in distributed storage systems.

[0131] As an alternative embodiment, Figure 5 This is a flowchart of the brushing method for an internal object according to an embodiment of this application, such as... Figure 5 As shown, the method may include the following steps:

[0132] Step S502: Obtain the positive relationship of the business object.

[0133] In this embodiment, the business object [offset, length] is obtained. Based on the business object's identifier (e.g., ObjX) and its data offset and length, forward relationship information associated with the business object can be retrieved from memory or persistent storage. This forward relationship information records the storage location of the business data within the internal object, including the internal object's ID and the specific range of data within the internal object.

[0134] Step S504: Update agg_shards based on the positive relationship.

[0135] In this embodiment, after obtaining the positive relationship, this information can be read and analyzed to determine which internal objects' valid data lengths need to be updated. The system applies these update operations to the `agg_shards` data structure in memory. The information in `agg_shards` includes the ID of each internal object and its valid data length. By updating `agg_shards`, the system can ensure that the metadata information of the internal objects remains consistent with the actual situation of the business object data.

[0136] Step S506: Write the updated agg_shards to disk.

[0137] In this embodiment, to persistently store updates to this metadata information, the system needs to write the updated agg_shards data structure content to the key-value database for disk persistence. In distributed storage systems, key-value databases are typically used to store metadata information, ensuring that the metadata information can still be correctly read and used even after a system restart or failure.

[0138] In one exemplary embodiment, the method may further include: transmitting the data to be acquired to the client, and modifying the effective data length information of the third metadata corresponding to the first internal object in response to acquiring the data to be acquired from the first internal object, wherein the third metadata is stored in memory.

[0139] Optionally, in a distributed storage system, once data in the data pool is requested and successfully retrieved by a client, the system needs to update the metadata stored in memory to reflect changes in the data's valid state. This update process is crucial for maintaining the accuracy and integrity of the data.

[0140] In this embodiment, after successfully retrieving the data required by the client from the first internal object deployed in the data pool, the system transmits this data back to the client to fulfill the data retrieval request. Since the client has successfully retrieved the data, the state of this data within the internal object may have changed. For example, if the retrieved data is part of a business object, this part of the data may no longer be considered "pending" or "valid." Therefore, the system needs to update the metadata information (third metadata) stored in memory to reflect changes in the valid data length of the internal object. This helps with subsequent operations such as garbage collection, ensuring effective management of storage space.

[0141] For example, suppose a client requests data starting from offset1 and with a length of 1 from the business object ObjA. This data is stored in the internal object agg_A1. The process can include the following steps: The system reads the data from agg_A1 and then transmits this data back to the client over the network. Since the data has already been retrieved by the client, the system updates the tertiary metadata corresponding to agg_A1 in memory, reducing its effective data length information. For example, if the original effective data length of agg_A1 was 10MB, and the client retrieved data with a length of 2MB, then the updated effective data length information should be 8MB.

[0142] In summary, by timely updating the effective data length information of the third metadata of the first internal object, the system can ensure the accuracy of the metadata, thereby making correct decisions in subsequent data retrieval, storage space management, and garbage collection operations. Updating the effective data length information helps the system accurately identify which data is invalid, thus freeing up storage space during garbage collection and optimizing the utilization of storage resources. In concurrent read scenarios, ensuring that the effective data length information of the metadata is updated promptly after each read operation helps improve data consistency and accuracy, avoiding errors caused by inconsistent data states.

[0143] Optionally, the effective data length information of the metadata is updated in memory, avoiding frequent disk read and write operations and improving the system's response speed and I / O efficiency.

[0144] In summary, by updating the effective data length information of the third metadata corresponding to the first internal object in memory after successful data acquisition, the distributed storage system can ensure the accuracy of metadata, optimize storage space management, and improve data consistency.

[0145] In an exemplary embodiment, the method may further include: in response to a change in business data, aggregating the changed business data to obtain a second internal object, wherein the second internal object is different from the historical internal object; constructing fourth metadata of the second internal object based on the data volume of the changed business data, wherein the fourth metadata is used at least to characterize the data storage status of the second internal object; constructing a mapping relationship between the changed business data and the second internal object to obtain a second positive relationship; and storing the second positive relationship and the fourth metadata in memory.

[0146] In this embodiment, when business data changes, the distributed storage system needs to perform a series of steps to process the new data and update metadata in order to maintain the correctness of the data and the consistency of the system.

[0147] Optionally, the modified parts of the business data are aggregated to form a new internal object (the second internal object). This new object differs in content from the original internal object (the historical internal object). The purpose of aggregation is to improve storage efficiency and data processing speed, especially in small I / O scenarios. Aggregation can transform multiple random small I / Os into sequential large I / Os, thereby increasing write bandwidth and reducing latency. Furthermore, metadata information (the fourth metadata) of the second internal object can be constructed based on the size of the changed business data. This fourth metadata can at least include information such as the internal object ID, data size, and effective data length, for subsequent data retrieval, storage space management, and other operations. A mapping relationship (the second forward relationship) can be established between the changed business data and the newly created internal object. In this way, when a client requests to read the changed business data, the system can quickly locate the correct internal object, ensuring data retrievability and consistency.

[0148] Optionally, to accelerate subsequent data retrieval and metadata update operations, the constructed second positive relationship and fourth metadata information can be stored in memory, which avoids frequent hard disk read and write operations and improves the system's response speed and performance.

[0149] For example, suppose a portion of the data of a business object ObjX is already stored in an internal object agg1, ​​and this data in ObjX is modified, with the new data size being 2MB. This modified new data can be aggregated to create a new internal object agg2 to store the updated data. Metadata information (fourth metadata) for agg2 can be constructed, including agg2's ID, data size of 2MB, and effective data length of 2MB. Furthermore, a mapping relationship can be established between ObjX and agg2, resulting in a second positive relationship, such as ObjX->agg2 (2MB). This second positive relationship and the fourth metadata information of agg2 can be stored in memory for subsequent metadata management and data retrieval operations.

[0150] In summary, by aggregating and processing changed business data, the system can optimize storage operations, improve write efficiency, and reduce latency, especially in low I / O scenarios, where the effects are significant. Storing the mapping relationship between changed business data and newly created internal objects, as well as the metadata information of the new internal objects, in memory simplifies metadata management and update processes, and improves data retrieval efficiency.

[0151] Optionally, through the above steps, the distributed storage system can effectively handle changes in business data, which not only improves data storage efficiency and processing speed and simplifies metadata management, but also ensures data consistency and accuracy, significantly improving user experience and storage space utilization efficiency. It is an important mechanism for data change processing and metadata management in distributed storage systems.

[0152] As an alternative implementation method, Figure 6 This is an example diagram of a write operation in a distributed storage system according to an embodiment of this application, such as... Figure 6 As shown, write operations in a distributed system can include the following: the client initiates a write request, sending the business data to be stored to the cache pool of the distributed storage system to complete the client write operation. Business objects can be encapsulated as write-ahead logging (WAL) data and committed to disk.

[0153] Optionally, the cache pool can merge business object data into memory for caching.

[0154] Optionally, after receiving a write request from a client, the cache pool temporarily stores the data in memory for caching. This is to accelerate the write process and reduce direct disk operations. Memory caching is the starting point for data aggregation and processing.

[0155] Optionally, when the amount of business data in memory reaches a certain level or meets specific conditions, this business data can be aggregated to form an internal object. An internal object is the unit of data aggregation; it has its own metadata used to record the state and location information of the data.

[0156] Optionally, after aggregation, the system writes (flushes) the internal objects from memory to the data pool for persistent storage. Flushing is the process of moving data from unstable memory storage to a stable data pool.

[0157] Optionally, once the data is flushed to the data pool, the data in memory becomes aging data, which will no longer be used by the cache pool and will instead wait to be cleaned up or reclaimed.

[0158] Optionally, after the flush operation is completed, the system will trigger a callback mechanism to ensure that the data is correctly written to the data pool and that metadata and positive relationships are updated synchronously. The flush callback mechanism is a key step in the data writing process, used to ensure data integrity and consistency.

[0159] Optionally, a data pool is a component in a distributed storage system used for persistent data storage. Internal object data is stored in the data pool after being flushed, ensuring data security and durability.

[0160] In this embodiment, when a client sends a write request, the system first stores the data in a memory cache. This step reduces the frequency of direct writes to the hard drive and improves write efficiency. When the data in the memory cache reaches a certain threshold, the system merges this data into internal objects and performs a flush operation, writing the internal objects to the data pool for persistent storage. After the data flush, the aging of memory data and aging of WAL objects is triggered. This typically involves cleaning up data in memory and updating metadata to ensure system storage efficiency and data consistency. After the flush operation is complete, the system uses a flush callback mechanism to confirm whether the data has been successfully written to the data pool and synchronously updates the metadata and positive relationships in memory to ensure that subsequent data reading and retrieval operations can be performed correctly. The data in the data pool is persistent and is used to store internal objects and ensure data reliability. The cache pool is a temporary storage area used to cache data and metadata, improving the efficiency of read and write operations.

[0161] The entities that perform the above steps can be servers, terminals, etc., but are not limited to these.

[0162] To facilitate understanding of the implementation methods of this application, relevant scenarios are explained below, but these explanations do not limit the scope of this application.

[0163] With the continuous development of information technology, data, as a valuable resource, has gradually gained importance. How to quickly process data resources and obtain expected results has become a key issue in the transformation from resources to assets. Various activities in people's work and life generate data. Collecting this data and then analyzing and processing it can yield very useful information, realizing the transformation from resources to assets, thus catalyzing the rapid development of big data and high-performance computing. Enterprise operations, traffic management, and public security management also generate massive amounts of data. This type of data often has a specific storage period, within which it needs to be searchable and viewable at any time. Based on the generation of massive amounts of data, data storage, as one of the core elements of data resources, has also entered a period of rapid development.

[0164] Traditional network storage systems use centralized storage servers to store all data. These servers become the bottleneck for system performance and a focal point for reliability and security, failing to meet the needs of large-scale storage applications. Distributed network storage systems, with their scalable architecture, not only improve system reliability, availability, and access efficiency but also facilitate expansion, thus gaining increasing acceptance from enterprises. Distributed storage systems typically exist in the form of storage server clusters. A typical cluster contains 10 storage server nodes, while the largest clusters currently exist with 1024 nodes, providing high-performance, massive data storage. Distributed storage servers can be categorized based on their storage media: hard disk storage, hybrid flash storage, and all-flash storage. A suitable solution is generally chosen based on a trade-off between the customer's business model and storage purchase costs.

[0165] With the development of storage media, the proportion of all-flash storage is increasing, gradually becoming the mainstream storage method. To fully utilize the performance of all-flash storage media, optimization is needed for small I / O scenarios. I / O aggregation is one such optimization method. By using techniques such as write-time redirection, multiple random small I / Os are aggregated into sequential large I / Os to improve write bandwidth and reduce latency. This aggregation is usually implemented using caching, allowing small I / O write operations to return to the client immediately after completion in the cache, while the background performs I / O aggregation and other processing. Small I / Os of business objects are aggregated into internal objects in the cache (internal objects refer to objects formed by the aggregation of multiple small I / Os in a distributed storage system) and then written to the data pool. The business object maintains metadata pointing to the location of the internal object, which is called a forward relationship. The internal object also maintains metadata pointing to the location of the business object, which is called a reverse relationship.

[0166] In related technologies, updating reverse metadata usually involves first reading the metadata of the internal object and then modifying the metadata. However, the above method has the technical problem of low efficiency in recycling internal objects.

[0167] To address the problems encountered in the application of the aforementioned algorithms, this application proposes a distributed storage design method for reverse metadata management. This scheme designs a set of in-memory data structures to store the effective data length of internal objects with minimal memory resources, avoiding the loading of internal object metadata information from disk. Each time the data is flushed, the effective data length of the old internal object is directly updated in memory, and then the updated old internal object information is synchronously written to disk when the business object is written to disk. During garbage collection, the internal objects in the data pool are also garbage collected based on the effective data length of the internal objects in memory, while simultaneously updating the effective data length of the internal objects in memory and writing it to disk.

[0168] After the service restarts, the effective data length information of the internal objects is loaded into memory during flushing or garbage collection, avoiding loading during service startup and thus improving recovery efficiency in fault scenarios. The effective data length of the internal objects in memory is written to the key-value database and persisted to disk in shard format, reducing the amount of data persisted to disk each time.

[0169] Alternatively, this method designs a set of memory data structures to store the effective data length of internal objects with minimal memory resources, while facilitating fast lookup and access to internal objects.

[0170] Optionally, the internal object metadata information stored in memory is resident in memory, and it is written to disk at two times: first, after the flush is completed, it is written to disk along with the business object metadata; second, it is modified and written to disk during garbage collection.

[0171] Optionally, during the flushing process, the relevant old internal object metadata information can be read directly from memory and the memory updated. During garbage collection, the internal objects that need to be reclaimed can be retrieved directly from memory, avoiding reading from disk and then retrieving them.

[0172] Optionally, after the service restarts, the internal object metadata information is loaded from the disk into memory during flushing or garbage collection and remains resident in memory.

[0173] Optionally, internal object metadata information can be stored in a key-value database in shard form, with metadata corresponding to each shard updated each time, avoiding full updates.

[0174] In this embodiment, a bool type variable is added to each PG, which can be used to characterize whether the internal object metadata has been loaded into memory.

[0175] Optionally, a boolean variable can be added: Assume each storage server (PG) has a boolean variable named `is_loaded`. When a PG starts up, the initial value of `is_loaded` is `false`, indicating that the PG's internal object metadata has not yet been loaded into memory. When the metadata is loaded, the value of `is_loaded` becomes `true`.

[0176] In this embodiment, a data structure can be designed to cache metadata information (e.g., first metadata) of internal objects in memory:

[0177] unorder_map <uint32_t, unordered_map<uint64_t, uint32_t> > agg_shards;

[0178] Each shard is written to the key-value database using its index as the key.

[0179] Optionally, the `agg_shards` mentioned above can be a two-level hash table structure used to cache metadata information of internal objects in memory. The key of the first-level hash table is a 32-bit unsigned integer (i.e., the shard index), and the value is the value of the second-level hash table. The key of the second-level hash table is a 64-bit unsigned integer (internal object ID), and the value is a 32-bit unsigned integer (the effective data length of the internal object).

[0180] In this embodiment, each time business object data is flushed, the effective data length of internal objects in agg_shards is updated by obtaining the old positive relationship.

[0181] Optionally, suppose business object `A` contains data fragments `offset1, length1`, which are aggregated to form an internal object `agg_oid1`. When business object `A` needs to be modified by writing new data fragments `offset2, length2` and aggregating them into a new internal object `agg_oid2`, the system will update the information about the effective data length of `agg_oid1` in `agg_shards`, and simultaneously insert information about `agg_oid2`. This is not data deletion, but rather updating metadata to reflect the data's state; that is, new data is aggregated into a new internal object, while the effective length of the old data is updated.

[0182] In this embodiment, the updated agg_shard can be written to disk simultaneously when pginfo is updated.

[0183] Optionally, whenever pginfo is updated, such as after a storage server restart or when metadata needs to be refreshed, the system will synchronously update the data in `agg_shards` and write this data to persistent storage, such as a key-value database.

[0184] In this embodiment, during garbage collection, the internal objects that need to be reclaimed are checked directly through the agg_shards in memory, without having to list and retrieve the metadata of internal objects from the disk.

[0185] Optionally, during garbage collection, the effective data length information of internal objects in `agg_shards` can be checked. If the effective data length of an internal object is 0, it means that the object is no longer referenced by any business object and can be safely reclaimed. Since `agg_shards` is resident in memory, the system can directly search for unnecessary internal objects in memory without reading metadata from the hard drive, thus greatly speeding up garbage collection.

[0186] In this embodiment, first metadata is pre-stored in memory. This first metadata can be used to determine the data storage status of historical internal objects. Therefore, when the business data of the initial business object changes, the initial business object will change into the target business object. Then, the initial metadata corresponding to the initial business object can be determined, and the initial metadata can be adjusted to obtain the target cloud data. Furthermore, based on the target metadata, it can be determined whether to recycle the historical internal objects, thereby solving the technical problem of low efficiency in recycling internal objects and achieving the technical effect of improving the efficiency of recycling internal objects.

[0187] Through the above description of the embodiments, those skilled in the art can clearly understand that the methods according to the above embodiments can be implemented by means of software plus necessary general-purpose hardware platforms. Of course, they can also be implemented by hardware, but in many cases the former is a better implementation method. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product is stored in a storage medium (such as ROM / RAM, magnetic disk, optical disk) and includes several instructions to cause a terminal device (which may be a mobile phone, computer, server, or network device, etc.) to execute the methods of the various embodiments of this application.

[0188] This embodiment also provides an internal object processing device for implementing the above embodiments and preferred embodiments; details already described will not be repeated. As used below, the term "module" can refer to a combination of software and / or hardware that implements a predetermined function. Although the device described in the following embodiments is preferably implemented in software, hardware implementation, or a combination of software and hardware, is also possible and contemplated.

[0189] Figure 7 This is a structural block diagram of an internal object processing apparatus according to an embodiment of this application, such as... Figure 7 As shown, the device includes:

[0190] The first determining unit 72 is configured to, in response to the initial business object changing into the target business object, determine the initial metadata corresponding to the initial business object from at least one first metadata stored in memory, wherein the first metadata is used to at least characterize the data storage of historical internal objects associated with the initial business object, and the historical internal objects are used to store business data in the initial business object.

[0191] Modification unit 74 is used to modify the initial metadata to obtain the target metadata;

[0192] The second determining unit 76 is used to determine the recycling strategy for historical internal objects based on the target metadata, wherein the recycling strategy is used to represent the rules for whether to recycle historical internal objects.

[0193] With the above-mentioned device, when the business data of the initial business object changes, the initial business object will change into the target business object. Then, the initial metadata corresponding to the initial business object can be determined, and the initial metadata can be adjusted to obtain the target cloud data. Furthermore, based on the target metadata, it can be determined whether to recycle the historical internal objects, thereby solving the technical problem of low efficiency in recycling internal objects and achieving the technical effect of improving the efficiency of recycling internal objects.

[0194] It should be noted that the above modules can be implemented by software or hardware. For the latter, they can be implemented in the following ways, but are not limited to: all the above modules are located in the same processor; or, the above modules are located in different processors in any combination.

[0195] Embodiments of this application also provide a computer-readable storage medium storing a computer program, wherein the computer program is configured to execute the steps in any of the above method embodiments when run.

[0196] In one exemplary embodiment, the aforementioned computer-readable storage medium may include, but is not limited to, various media capable of storing computer programs, such as a USB flash drive, read-only memory (ROM), random access memory (RAM), portable hard disk, magnetic disk, or optical disk.

[0197] Embodiments of this application also provide an electronic device, including a memory and a processor, wherein the memory stores a computer program and the processor is configured to run the computer program to perform the steps in any of the above method embodiments.

[0198] Optionally, Figure 8 This is a computer system architecture block diagram of an electronic device according to an embodiment of this application. For example... Figure 8 As shown, the computer system 800 includes a central processing unit (CPU) 801, which can perform various appropriate actions and processes based on programs stored in read-only memory (ROM) 802 or programs loaded from storage section 808 into random access memory (RAM). The RAM 803 also stores various programs and data required for system operation. The CPU 801, ROM 802, and RAM 803 are interconnected via a bus 804. An input / output interface 805 (I / O interface) is also connected to the bus 804.

[0199] The following components are connected to the input / output interface 805: an input section 806 including a keyboard, mouse, etc.; an output section 807 including a cathode ray tube (CRT), liquid crystal display (LCD), etc., and speakers, etc.; a storage section 808 including a hard disk, etc.; and a communication section 809 including a network interface card such as a local area network card, modem, etc. The communication section 809 performs communication processing via a network such as the Internet. A drive 810 is also connected to the input / output interface 805 as needed. A removable medium 811, such as a disk, optical disk, magneto-optical disk, semiconductor memory, etc., is installed on the drive 810 as needed so that computer programs read from it can be installed into the storage section 808 as needed.

[0200] In one exemplary embodiment, the electronic device may further include a transmission device and an input / output device, wherein the transmission device is connected to the processor and the input / output device is connected to the processor.

[0201] Embodiments of this application also provide a computer program product, which includes a computer program that, when executed by a processor, implements the steps in any of the above method embodiments.

[0202] Embodiments of this application also provide another computer program product, including a non-volatile computer-readable storage medium storing a computer program that, when executed by a processor, implements the steps in any of the above method embodiments.

[0203] The embodiments described herein also provide a computer program that includes computer instructions stored in a computer-readable storage medium; a processor of a computer device reads the computer instructions from the computer-readable storage medium and executes the computer instructions, causing the computer device to perform the steps in any of the above method embodiments.

[0204] Specific examples in this embodiment can be found in the examples described in the above embodiments and exemplary implementations, and will not be repeated here.

[0205] Obviously, those skilled in the art should understand that the modules or steps of this application described above can be implemented using general-purpose computing devices. They can be centralized on a single computing device or distributed across a network of multiple computing devices. They can be implemented using computer-executable program code, and thus can be stored in a storage device for execution by a computing device. In some cases, the steps shown or described can be performed in a different order than those presented here, or they can be fabricated as separate integrated circuit modules, or multiple modules or steps can be fabricated as a single integrated circuit module. Thus, this application is not limited to any particular combination of hardware and software.

[0206] The above are merely preferred embodiments of this application and are not intended to limit this application. Various modifications and variations can be made to this application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the principles of this application should be included within the protection scope of this application.

Claims

1. A method for processing internal objects, characterized in that, include: In response to an initial business object changing into a target business object, initial metadata corresponding to the initial business object is determined from at least one first metadata stored in memory, wherein the first metadata is used to at least characterize the storage state of data in a historical internal object associated with the initial business object, and the historical internal object is used to store the business data of the initial business object; The initial metadata is modified to obtain the target metadata; Based on the target metadata, a recycling strategy for the historical internal objects is determined, wherein the recycling strategy is used to represent the rules for whether to recycle the historical internal objects; The step of determining the initial metadata corresponding to the initial business object from at least one first metadata stored in memory in response to the initial business object changing into the target business object includes: determining a reverse relationship corresponding to the initial business object in response to the initial business object changing into the target business object, wherein the reverse relationship is used to characterize the mapping relationship between the historical internal object and the initial business object; determining the historical internal object mapped by the initial business object based on the reverse relationship; determining the identity information of the historical internal object; and determining the first metadata containing the identity information from at least one of the first metadata as the initial metadata.

2. The method according to claim 1, characterized in that, The modification of the initial metadata to obtain the target metadata includes: Determine the amount of data corresponding to the business data; According to the data volume, change the effective data length information in the initial metadata, wherein the effective data length information is used to characterize the data storage volume in the historical internal object.

3. The method according to claim 2, characterized in that, The step of determining the recycling strategy for the historical internal objects based on the target metadata includes: In response to the effective data length information being less than a first target value, the recycling strategy is determined to recycle the historical internal objects.

4. The method according to claim 3, characterized in that, The step of determining the recycling strategy for the historical internal objects based on the target metadata includes: In response to the effective data length information being greater than or equal to the first target value, it is determined that the recycling strategy is not to recycle the historical internal objects.

5. The method according to claim 1, characterized in that, The step of determining the first metadata containing the identity information from at least one of the first metadata as the initial metadata includes: The database is searched according to the identity information to obtain the initial metadata, wherein the database is used to store the first metadata.

6. The method according to claim 5, characterized in that, The method further includes: When memory restarts, in response to the need to reclaim the historical internal objects or a change in the data storage of the historical internal objects, the initial metadata is loaded into the memory.

7. The method according to claim 1, characterized in that, The method further includes: In response to a data storage instruction issued by the client, obtain multiple data items to be stored corresponding to the data storage instruction; The multiple data to be stored are aggregated to obtain a first internal object; Based on the amount of data to be stored, a second metadata of the first internal object is constructed, wherein the second metadata is used to characterize the data storage situation of the first internal object; Construct a mapping relationship between the data to be stored and the first internal object to obtain a first positive relationship; The first positive relationship and the second metadata are stored in the memory.

8. The method according to claim 7, characterized in that, The method further includes: Obtain the data retrieval request sent by the client, wherein the data retrieval request is used to retrieve the data to be retrieved stored in the data pool; Determine the first business object corresponding to the data acquisition request; Retrieve the first positive relationship corresponding to the first business object from the memory; Based on the first positive relationship, determine the identity information of the first internal object mapped by the target business object; According to the identity information, the data to be acquired is obtained from the first internal object deployed in the data pool.

9. The method according to claim 8, characterized in that, The method further includes: The data to be acquired is transmitted to the client, and in response to acquiring the data to be acquired from the first internal object, the effective data length information of the third metadata corresponding to the first internal object is modified, wherein the third metadata is stored in the memory.

10. The method according to claim 1, characterized in that, The method further includes: In response to a change in the business data, the changed business data is aggregated to obtain a second internal object, wherein the second internal object is different from the historical internal object; Based on the data volume of the changed business data, a fourth metadata of the second internal object is constructed, wherein the fourth metadata is used to characterize the data storage situation of the second internal object at least; Construct a mapping relationship between the changed business data and the second internal object to obtain a second positive relationship; The second positive relationship and the fourth metadata are stored in the memory.

11. A processing apparatus for internal objects, characterized in that, include, The first determining unit is configured to, in response to an initial business object changing into a target business object, determine initial metadata corresponding to the initial business object from at least one first metadata stored in memory, wherein the first metadata is used to at least characterize the data storage of historical internal objects associated with the initial business object, and the historical internal objects are used to store business data in the initial business object; The modification unit is used to modify the initial metadata to obtain the target metadata; The second determining unit is used to determine the recycling strategy of the historical internal objects based on the target metadata, wherein the recycling strategy is used to represent the rules for whether to recycle the historical internal objects; The first determining unit is further configured to, in response to the initial business object changing to the target business object, determine the reverse relationship corresponding to the initial business object, wherein the reverse relationship is used to characterize the mapping relationship between the historical internal object and the initial business object; based on the reverse relationship, determine the historical internal object mapped by the initial business object; determine the identity information of the historical internal object; and determine the first metadata containing the identity information from at least one of the first metadata as the initial metadata.

12. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program, wherein the computer program, when executed by a processor, implements the steps of the method described in any one of claims 1 to 10.

13. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the computer program, it implements the steps of the method described in any one of claims 1 to 10.

14. A computer program product, comprising a computer program, characterized in that, When the computer program is executed by a processor, it implements the steps of the method described in any one of claims 1 to 10.