Repository replica management method and related device
By using an object-level repository copy management method, copies are created only for object data in the code repository that occupy a large amount of storage space and have a low update frequency. This solves the problems of high complexity and susceptibility to corruption in existing technologies, and improves concurrency and download performance.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- HUAWEI TECH CO LTD
- Filing Date
- 2025-07-18
- Publication Date
- 2026-07-02
AI Technical Summary
In existing technologies, code repository copy management relies on complex strong consistency algorithms, which leads to high algorithm complexity and the risk of repository corruption. In particular, storage I/O is saturated during high-concurrency access, affecting concurrency capabilities.
By creating replicas only for object data that occupies a large amount of storage space and is updated infrequently, while keeping reference data that occupies a small amount of storage space and is updated frequently shared, an object-level repository replica management method is adopted, which simplifies synchronization and consistency algorithms and avoids complex relationships.
It implements simplified synchronization and consistency algorithms, reduces storage resource consumption, improves concurrency capabilities, and meets the performance requirements of application programming interface scenarios with high latency requirements.
Smart Images

Figure CN2025109216_02072026_PF_FP_ABST
Abstract
Description
A warehouse copy management method and related equipment
[0001] This application claims priority to Chinese Patent Application No. 202411981691.8, filed with the State Intellectual Property Office of China on December 26, 2024, entitled "A Method for Managing Warehouse Copies and Related Equipment", the entire contents of which are incorporated herein by reference. Technical Field
[0002] This application relates to the field of cloud computing technology, and in particular to a warehouse copy management method, a warehouse hosting system, a computing device cluster, a computer-readable storage medium, and a computer program product. Background Technology
[0003] As codebases grow larger, more and more developers are choosing to host their code in code repositories. A code repository is essentially a key-value file database. Object data in the repository is stored in the file system, and each object has a unique hash value as its key, with objects linked together through these keys.
[0004] The data in a code repository mainly consists of two parts: reference data and object data. Reference data acts as a pointer to a specific object. Each time a code commit is triggered, the reference data is updated to point to the new object. Files, directories, and commits in the repository are all stored as object data, each with a unique hash value. Once generated, object data remains unchanged. If a file changes, a new object is generated. This can result in the total size of the code repository's object data reaching gigabytes (GB). When high-concurrency accesses a very large code repository, the large amount of data read may cause the storage input / output (IO) space to become full. The main way to solve this problem is to create replicas of the code repository, supporting horizontal scaling of storage, thereby ensuring the concurrency capabilities of the code repository.
[0005] Currently, the industry standard for creating code repository replicas involves completely copying all the data in the code repository, specifically copying reference data and object data, and then using synchronization algorithms to maintain consistency between the replica repository and the primary repository. Strongly consistent replica repositories employ multi-write algorithms to ensure consistency between the replica and primary repositories; that is, when the repository is updated, it is written to both the primary and replica repositories simultaneously, and during access, it is randomly read from either the primary or replica repository based on a configured strategy. However, strongly consistent replica repositories rely on complex strong consistency algorithms to resolve repository forks caused by concurrent writes, resulting in high complexity and a vulnerability to repository corruption. Summary of the Invention
[0006] This application provides a repository copy management method. This method creates copies only for object data that occupies a large amount of storage space and is updated infrequently, while keeping reference data that occupies a small amount of storage space and is updated frequently shared, thereby achieving object-level repository replication. Object-level repository replication does not require concern with the complex relationships between repository reference data and object data. The synchronization and consistency algorithms are simple, solving the problems of high algorithm complexity and susceptibility to repository corruption in related technologies. This application also provides a repository hosting system, computing device cluster, computer-readable storage medium, and computer program product corresponding to the above method.
[0007] Firstly, this application provides a warehouse copy management method. This method can be executed by a warehouse hosting system. The warehouse hosting system can also be called a warehouse hosting service, warehouse management system, or warehouse management service. The warehouse hosting system is used to manage a user's warehouse. The warehouse hosting system can be a software system, which can be standalone software with warehouse management functions, or integrated into other software as a plugin, component, functional module, or mini-program. The software system can be provided to users as a client tool (also simply called a client), supporting extended storage for local or cached warehouses and improving the concurrency capabilities of local or cached warehouses. The software system can also be provided to users as a cloud service. Specifically, the cloud service can provide configuration options to enhance the concurrency capabilities of ultra-large warehouses. These configuration options allow users to customize the number of copies of compressed files in the warehouse or provide options for supporting memory acceleration. In some possible implementations, the warehouse hosting system can also include a hardware system. For example, the warehouse hosting system can include a cluster of computing devices with warehouse management capabilities, which executes the warehouse copy management method of this application when running.
[0008] Specifically, the repository hosting system can receive backup commands to back up the user's primary repository. The primary repository includes at least one object data item and reference data pointing to that object data item. The repository hosting system can then respond to the backup command by compressing at least one object data item in the primary repository to generate a compressed file. For example, the repository hosting system can compress loose data in the primary repository to generate a compressed file. Loose data refers to object data that has not yet been integrated into an efficient storage structure. In a code development scenario, loose data can include new commit objects, tree objects, and Binary Large Objects (BLOBs) created by the repository hosting system when the user performs code modifications and commits. These new commit objects, tree objects, and BLOBs are stored in a relatively independent, loosely organized manner in the object directory of the primary repository. The repository hosting system can also store copies of the compressed file in a replica storage medium corresponding to the primary repository. These copies share reference data with the compressed file in the primary repository.
[0009] This method creates replicas only for object data in the repository that occupy a large amount of storage space and are updated infrequently (e.g., are static), while keeping reference data that occupies a small amount of storage space and is updated frequently shared, thus achieving object-level repository replication. The object-level repository replication synchronization and consistency algorithm is simple, thus solving the problem of high algorithm complexity and susceptibility to repository corruption that arises from relying on complex strong consistency algorithms to address repository forks caused by concurrent writes.
[0010] In some possible implementations, the repository hosting system can display a configuration interface to the user and then receive the object replica storage directory configured by the user through the configuration interface. The object replica storage directory indicates the path to the replica storage medium. Accordingly, the repository hosting system can write multiple copies of the compressed files to the replica storage medium according to the object replica storage directory.
[0011] In this method, the object-level repository copy does not need to concern itself with the complex relationships between repository reference data and object data. Repository data is split into reference data and object data, and object data is stored in the replica storage medium using a multi-write approach, ensuring consistency between the primary repository and the object-level copy. Furthermore, this method supports custom object replica storage directories, offering high flexibility.
[0012] In some possible implementations, the replica storage medium includes persistent media, cloud storage, or distributed caching (such as distributed memory caching). This approach allows users to scale the object-level replicas of the repository themselves and leverage technologies such as distributed memory caching to address storage bottlenecks and improve concurrency.
[0013] In some possible implementations, the backup command includes a system task-triggered backup command or a user-triggered backup command. The system task may include, but is not limited to, a scheduled task, such as a periodic backup task. A scheduled task can periodically package and back up loose object data in the repository. A user-triggered backup command may be a backup command manually triggered by the user for specified object data.
[0014] In some possible implementations, the repository hosting system may also receive a read command and then, in response to the read command, read a copy of the compressed file from the copy storage medium.
[0015] Compared to weakly consistent replica repositories, which have some synchronization latency, this method has no download synchronization latency, thus improving download performance. Furthermore, the low latency of object-level repository replicas meets the requirements of application programming interface (API) scenarios with high latency constraints.
[0016] In some possible implementations, the repository hosting system can also read the reference data from the primary repository. Accordingly, the repository hosting system can read a copy of the compressed file from the copy storage medium based on the reference data.
[0017] In this method, the repository hosting system only needs to read reference data from the main repository, without having to read the object data that consumes a lot of bandwidth. Instead, it reads the object data from the replica storage medium, which greatly alleviates the resource consumption of the main repository and releases the performance of the main repository.
[0018] In some possible implementations, when the compressed file is not present in the copy storage medium, the repository hosting system reads the compressed file from the main repository.
[0019] This method first reads the copy storage medium. If there is no copy of the compressed file in the copy storage medium, then it reads the compressed file from the main repository. This approach minimizes the resource consumption of the main repository while ensuring that data can be read.
[0020] In some possible implementations, the repository hosting system may also receive a delete command, and then, in response to the delete command, delete the compressed file in the main repository and delete a copy of the compressed file in the copy storage medium.
[0021] The delete command can be generated when a compressed file or a copy of a compressed file is successfully written. The repository hosting system can respond to this command by deleting historical versions of compressed files in the primary repository and copies of those historical versions in the replica storage media. Alternatively, the delete command can be used to delete specified object data, such as a specified compressed file. The repository hosting system can respond to this command by deleting the specified compressed file in the primary repository and copies of that compressed file in the replica storage media. It's important to note that in scenarios requiring strong consistency, delete operations can be performed on both the primary repository and the replicas to maintain consistency. For example, the repository hosting system can simultaneously delete compressed files in the primary repository and copies of those compressed files in the replica storage media to ensure data consistency between the primary and replicas. In scenarios with weak consistency, if deletion on the primary repository fails, the repository hosting system does not need to perform a rollback operation on the replicas.
[0022] In some possible implementations, the repository hosting system can also receive the number of replicas configured by the user through a configuration interface. The number of replicas is n, where n is a positive integer. Accordingly, the repository hosting system can store copies of the compressed file in n replica storage media corresponding to the primary repository. This method supports the creation of multiple object-level repository replicas, thereby improving concurrency.
[0023] In some possible implementations, the repository hosting system may store a copy of the compressed file in n replica storage media corresponding to the main repository, with each of the n replica storage media storing a portion of the compressed file. Alternatively, the repository hosting system may store n copies of the compressed file in n replica storage media corresponding to the main repository, with each of the n replica storage media storing a copy of the compressed file.
[0024] In this method, each of the n replica storage media stores a portion of the compressed file, enabling distributed storage. On one hand, if a copy is damaged, it can be recovered from the remaining portion, ensuring security. On the other hand, this method achieves copy security with less storage space, significantly reducing costs. Each of the n replica storage media stores one copy of the compressed file, thus ensuring data security through multiple complete copies. Furthermore, when reading data, it can be read concurrently from multiple replica storage media, improving read performance.
[0025] Secondly, this application provides a warehouse hosting system. The warehouse hosting system is used to manage a user's warehouse, and the warehouse hosting system includes:
[0026] A communication module is used to receive a backup command, which is used to back up the user's main repository. The main repository includes at least one object data and reference data of the at least one object data, and the reference data is used to point to the at least one object data.
[0027] A copy creation module is used to compress at least one object data in the main repository in response to the backup command, and generate a compressed file;
[0028] The copy creation module is also used to store a copy of the compressed file in the copy storage medium corresponding to the main repository;
[0029] The copy shares the reference data with the compressed file in the main repository.
[0030] In some possible implementations, the system further includes:
[0031] The display module is used to display the configuration interface to the user;
[0032] The communication module is further configured to receive the object replica storage directory configured by the user through the configuration interface, wherein the object replica storage directory is used to indicate the path of the replica storage medium;
[0033] The copy creation module is specifically used for:
[0034] According to the object copy storage directory, a copy of the compressed file is written to the copy storage medium.
[0035] In some possible implementations, the replica storage medium includes persistent media, cloud storage, or distributed cache.
[0036] In some possible implementations, the backup command includes a backup command triggered by a system task or a backup command triggered by a user.
[0037] In some possible implementations, the communication module is further used for:
[0038] Receive read command;
[0039] The system also includes:
[0040] A read module is configured to read a copy of the compressed file from the copy storage medium in response to the read command.
[0041] In some possible implementations, the reading module is further configured to:
[0042] Read the reference data from the main repository;
[0043] The reading module is specifically used for:
[0044] A copy of the compressed file is read from the copy storage medium according to the reference data.
[0045] In some possible implementations, the reading module is further configured to:
[0046] If the compressed file is not present in the copy storage medium, the compressed file is read from the main repository.
[0047] In some possible implementations, the communication module is further used for:
[0048] The communication module is used to receive delete commands;
[0049] The system also includes:
[0050] A deletion module is configured to, in response to the deletion command, delete the compressed file in the main repository and delete a copy of the compressed file in the copy storage medium.
[0051] In some possible implementations, the communication module is further used for:
[0052] The system receives the number of replicas configured by the user through the configuration interface, where the number of replicas is n, and n is a positive integer.
[0053] The copy creation module is specifically used for:
[0054] The compressed file is stored in n copy storage media corresponding to the main repository.
[0055] In some possible implementations, the replica creation module is specifically used for:
[0056] One copy of the compressed file is stored in one of the n replica storage media corresponding to the main repository, and each of the n replica storage media stores a portion of the compressed file; or...
[0057] The compressed file is stored in n copies of the storage medium corresponding to the main repository, and each of the n copies of the storage medium stores one copy of the compressed file.
[0058] Thirdly, this application provides a computing device cluster. The computing device cluster includes at least one computing device, and the at least one computing device includes at least one processor and at least one memory. The at least one processor and the at least one memory communicate with each other. The at least one processor is used to execute instructions stored in the at least one memory to cause the computing device or the computing device cluster to perform the repository copy management method as described in the first aspect or any implementation thereof.
[0059] Fourthly, this application provides a computer-readable storage medium storing instructions that instruct a computing device or a cluster of computing devices to perform the repository copy management method described in the first aspect or any implementation thereof.
[0060] Fifthly, this application provides a computer program product containing instructions that, when run on a computing device or a cluster of computing devices, causes the computing device or cluster of computing devices to perform the repository copy management method described in the first aspect or any implementation thereof.
[0061] Based on the implementation methods provided in the above aspects, this application can be further combined to provide more implementation methods. Attached Figure Description
[0062] To more clearly illustrate the technical methods of this application, the accompanying drawings used will be briefly described below.
[0063] Figure 1 is a schematic diagram of creating a copy of a warehouse according to this application;
[0064] Figure 2A is a schematic diagram of replica synchronization in a weakly consistent replica repository provided in this application;
[0065] Figure 2B is a schematic diagram of replica synchronization in a strongly consistent replica repository provided in this application;
[0066] Figure 3 is a schematic diagram of the architecture of a warehouse hosting system provided in this application;
[0067] Figure 4 is a flowchart of a warehouse copy management method provided in this application;
[0068] Figure 5 is a schematic diagram of data distribution in a master repository and a copy provided in this application;
[0069] Figure 6 is a schematic diagram of a warehouse copy management method provided in this application;
[0070] Figure 7 is a flowchart of another warehouse copy management method provided in this application;
[0071] Figure 8 is a schematic diagram of the structure of a computing device provided in this application;
[0072] Figure 9 is a schematic diagram of the structure of a computing device cluster provided in this application;
[0073] Figure 10 is a schematic diagram of another computing device cluster provided in this application;
[0074] Figure 11 is a schematic diagram of another computing device cluster provided in this application. Detailed Implementation
[0075] The terms "first" and "second" used in the embodiments of this application are for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of indicated technical features. Therefore, a feature defined with "first" and "second" may explicitly or implicitly include one or more of that feature.
[0076] First, some technical terms involved in the embodiments of this application will be introduced.
[0077] A repository is a project database managed by version control system tools. A project database can include a historically traceable collection of files. In practice, a repository can use a key-value (KV) file database for data storage. For example, a repository can be a code repository, used to store various code files of a software project through a KV file database.
[0078] A repository consists of object data and reference data. Object data is a data type used to store various key-value sets or more complex entities. Object data is typically stored in the file system. Files, directories, or commit data in a repository can all be stored as object data. Object data in a repository usually has a unique hash value, which can be used as a key, allowing objects in the repository to be associated with each other. It's important to note that once object data is generated, its hash value remains unchanged. When a file changes, new object data is generated. Reference data is used to point to object data. Each reference data is like a pointer to a specific object data; each time a commit operation is triggered, the reference data can be updated to point to the new object data.
[0079] As software projects iterate and object data is continuously updated, the size of the repository and object data also increases. For example, the total size of object data can reach the GB level. When a large number of requests access a large-scale repository, the storage I / O may be completely occupied due to the large amount of data read. To address this, replicas of the repository, also known as replica repositories, can be created to support horizontal scaling of storage, thereby ensuring the repository's concurrency capabilities.
[0080] Referring to Figure 1, which illustrates a method for creating a replica of a repository, the industry practice is to completely copy all the data in the repository, including reference data and object data, and then maintain consistency between the replica repository and the primary repository through a synchronization algorithm. Consistency refers to maintaining data consistency across multiple replicas.
[0081] Based on the timeliness of repository synchronization, replica repositories can be divided into strongly consistent replica repositories and weakly consistent replica repositories. These will be explained below with reference to the accompanying diagram.
[0082] First, referring to Figure 2A, which illustrates replica synchronization in a weakly consistent replica repository, weakly consistent replica repositories experience a certain synchronization delay. To ensure eventual consistency with the primary repository, when accessing the repository, the replica repository first checks if it is consistent with the primary repository. If they are inconsistent, the replica repository first synchronizes data from the primary repository before responding to user requests and providing the corresponding services. For example, when the replica repository receives a replica download request, it first checks if the primary and replica repositories are consistent. If not, the replica repository synchronizes with the primary repository. After synchronization is complete and the data in the replica is confirmed to be consistent with the primary repository, it returns the repository-level replica to the client.
[0083] Weakly consistent repository-level replicas suffer from synchronization latency issues, and the synchronization mechanism during access has performance problems. Therefore, their use cases are limited, mainly used to provide download capabilities, and cannot be used in application programming interface (API) scenarios with high latency requirements.
[0084] Next, referring to Figure 2B, which illustrates replica synchronization in a strongly consistent replica repository, the strongly consistent replica repository uses a multi-write algorithm to ensure consistency between the replica repository and the primary repository. In other words, when the repository is updated, the updated data is written to both the primary and replica repositories simultaneously. When accessing the repository, data can be randomly read from either the primary or replica repository according to the configured strategy.
[0085] Strongly consistent repository-level replicas rely on complex strong consistency algorithms to resolve repository forks caused by concurrent writes. This approach is highly complex and prone to repository corruption.
[0086] In view of this, this application provides a warehouse copy management method. This method can be applied to a warehouse hosting system, which can also be called a warehouse hosting service, warehouse management system, or warehouse management service. The warehouse hosting system is used to manage a user's warehouse. The warehouse hosting system can be a software system, which can be standalone software with warehouse management functions, or integrated into other software as a plug-in, component, functional module, or mini-program.
[0087] The software system can be provided to users as a client tool (or simply a client), supporting extended storage for local or cached repositories and improving their concurrency capabilities. Specifically, the client can provide repository acceleration capabilities based on the object-level replication capabilities of this application, maintaining the sharing of referenced data in the repository. When executing backup commands (such as repack or backup), multiple copies of compressed files (such as packfiles) are created and stored on different storage media. When data is needed, read operations are performed from multiple storage media, thereby improving the concurrency capabilities of local or cached repositories.
[0088] The software system can also be provided to users as a cloud service. Specifically, the cloud service can provide configuration options to enhance the concurrency capabilities of a large repository. These options allow users to customize the number of copies of compressed files in the repository or to provide support for memory acceleration. High-speed downloading of the repository can be achieved through object-level replication as described in this application.
[0089] In some possible implementations, the warehouse hosting system may also include a hardware system. For example, the warehouse hosting system may include a cluster of computing devices with warehouse management capabilities, which, when running, executes the warehouse copy management method of this application.
[0090] Specifically, the repository hosting system can receive a backup command to back up the user's primary repository. The primary repository includes at least one object data item and reference data pointing to that object data item. The backup command can be used to package and back up loose data in the primary repository (e.g., commit objects, tree objects, etc. generated by code commits). The repository hosting system can then respond to the backup command by compressing at least one object data item in the primary repository, generating a compressed file. The repository hosting system can then store a copy of the compressed file on a corresponding replica storage medium for the primary repository. The copy shares the reference data with the compressed file in the primary repository.
[0091] This method achieves object-level repository replication by creating copies only for object data that occupies a large amount of storage space and is updated infrequently (e.g., is static), while keeping reference data that occupies a small amount of storage space and is updated frequently shared. Object-level repository replication does not need to concern itself with the complex relationships between reference data and object data. It splits repository data into reference data and object data, and object data can be stored in fixed storage or cache through multiple write operations. When a read is needed, reference data can be read from the main repository, and object data can be read from the replica.
[0092] Users can extend the object-level replicas of the repository themselves and utilize technologies such as distributed memory caching to solve storage bottlenecks and improve concurrency capabilities. Furthermore, the object-level repository replica synchronization and consistency algorithms are simple, with no download synchronization latency, which can improve download performance. The low latency of object-level repository replicas can meet the needs of API scenarios with high latency requirements.
[0093] To make the technical solution of this application clearer and easier to understand, the system architecture of the warehouse hosting system of this application is described below with reference to the accompanying drawings.
[0094] Referring to Figure 3, which illustrates the architecture of a repository hosting system 30, the system manages user repositories. The repository hosting system 30 includes a communication module 302 and a replica creation module 304. The communication module 302 and replica creation module 304 can be modules within a server, such as modules within a repository hosting service. The repository hosting system 30 may also include a display module 306. The display module 306 can be a module within a user device, such as a module in a client (e.g., a Git client) or a module in a browser. Furthermore, the repository hosting system 30 may also include a read module 301, a delete module 303, and a task management module 305.
[0095] The following is a detailed introduction to each module in the warehouse hosting system 30.
[0096] Communication module 302 is used to receive backup commands. These backup commands can be commands generated by triggering a cleanup operation on the main repository. For example, backup commands can include the `repack` command, which compresses loose object data in the main repository and creates copies to achieve backup. In some examples, backup commands can also be commands generated by triggering a backup operation on the main repository; for example, backup commands can include `backup`. The main repository includes at least one object data and reference data to at least one object data. The reference data points to at least one object data. Backup commands can be used to create copies of specified object data in the main repository, thereby achieving object-level replication.
[0097] The communication module 302 can be a module within the front-end service. In some possible implementations, the communication module 302 can receive backup commands triggered by a user. In other possible implementations, the communication module 302 can also receive backup commands triggered by a system task. The system task can be created or generated by the task management module 305. In some examples, the system task can include a scheduled task or a periodic backup task. The time or backup cycle of the scheduled task can be set by the user based on experience or be a system default setting.
[0098] The replica creation module 304, in response to a backup command, compresses at least one object data in the primary repository to generate a compressed file. The replica creation module 304 also stores a copy of the compressed file in a replica storage medium corresponding to the primary repository. The copy shares reference data with the compressed file in the primary repository. The replica storage medium includes, but is not limited to, persistent media, cloud storage, or distributed cache. It should be noted that the replica creation module 304 can be a module within a backend service.
[0099] The display module 306 is used to display the configuration interface to the user. Correspondingly, the communication module 302 is also used to receive the object replica storage directory configured by the user through the configuration interface. This object replica storage directory indicates the path to the replica storage medium. The replica creation module 304 can write copies of the compressed file to the replica storage medium according to the object replica storage directory. Multiwriter is used to write data to multiple output targets simultaneously, such as multiple storage media. Specifically, the multiwriter mechanism uses multiple write executors ("writers") to simultaneously write data, broadcasting each write operation to multiple output targets. When data is written, the multiwriter mechanism ensures that all writers receive the same data; if one write fails, an error is returned. This ensures consistency between the primary repository and the replicas.
[0100] To improve concurrency, this application also supports the creation of multiple object-level repository copies. Specifically, the communication module 302 is also used to receive the number of copies configured by the user through the configuration interface. Here, the number of copies is n, where n is a positive integer (0). Correspondingly, the copy creation module 304 is specifically used to store copies of the compressed file in n copy storage media corresponding to the main repository.
[0101] Specifically, the copy creation module 304 can store one copy of the compressed file in each of the n copy storage media corresponding to the main repository. Each of the n copy storage media stores a portion of the compressed file. Alternatively, the copy creation module 304 can store n copies of the compressed file in the n copy storage media corresponding to the main repository. Each of the n copy storage media stores one copy of the compressed file.
[0102] The above section introduced data backup. The following section introduces data reading and data deletion.
[0103] The communication module 302 is also used to receive read commands. Correspondingly, the read module 301 is used to read a copy of the compressed file from the replica storage medium in response to the read command. The read module 301 can be a module in the backend service. In some possible implementations, the backend service may also include a communication module, which can receive commands from the frontend service or client, such as a read command from the client. The read module 301 is also used to read reference data from the main repository. Accordingly, the read module 301 can read a copy of the compressed file from the replica storage medium based on the reference data. The read module 301 can read a copy of the compressed file from the replica storage medium according to a configured polling algorithm when reading object data from the repository. It should be noted that when the compressed file does not exist in the replica storage medium, the read module 301 can read the compressed file from the main repository. For example, if a copy of the compressed file is deleted from the replica storage medium or a copy of the compressed file has not yet been successfully written to the replica storage medium, the read module 301 can read the compressed file from the main repository.
[0104] The communication module 302 is also used to receive deletion commands. Accordingly, the deletion module 303 is used to delete the compressed file in the main repository and delete a copy of the compressed file in the copy storage medium in response to the deletion command.
[0105] In some possible implementations, the delete command can be generated when a compressed file or a copy of a compressed file is successfully written. The delete module 303 can then respond to the delete command by deleting the historical version of the compressed file in the primary repository and deleting a copy of the historical version of the compressed file in the replica storage medium. In other possible implementations, the delete command can be used to delete specified object data, such as a specified compressed file. The delete module 303 can then respond to the delete command by deleting the specified compressed file in the primary repository and deleting a copy of the specified compressed file in the replica storage medium. It should be noted that in scenarios requiring strong consistency, the delete operation can be performed on both the primary repository and the replica to maintain consistency. For example, the delete module 303 can simultaneously delete the compressed file in the primary repository and the copy of the compressed file in the replica storage medium to ensure data consistency between the primary repository and the replica. In scenarios with weak consistency, if deletion on the primary repository fails, the repository hosting system does not need to perform a rollback operation on the replica.
[0106] The structure of the warehouse hosting system 30 shown in Figure 3 is merely an illustrative division. In actual applications, the warehouse hosting system 30 can also have other structures, and this application does not impose any restrictions on this.
[0107] Based on the warehouse hosting system 30 shown in Figure 3, this application also provides a warehouse copy management method. The warehouse copy management method of this application will be described in detail below with reference to specific embodiments.
[0108] Referring to Figure 4, a flowchart of a warehouse copy management method is shown. This method can be executed by the warehouse hosting system 30. The warehouse hosting system 30 is used to manage users' warehouses, such as creating copies, and specifically includes the following steps:
[0109] S402, Warehouse hosting system 30 receives backup command.
[0110] The backup command is used to back up the user's main repository. The main repository includes at least one object and reference data to that object. The reference data points to at least one object.
[0111] Specifically, the warehouse hosting system 30 can receive backup commands triggered by users. For example, a user can trigger a package backup operation on a specified object, thereby triggering a backup command. The methods for triggering the backup operation can include, but are not limited to, clicking (e.g., mouse click, stylus click), voice triggering, or touch control. Figure 4 illustrates an example of a user-triggered backup command. In other possible implementations of this application, the backup command can also be a backup command triggered by a system task. The system task can include, but is not limited to, scheduled tasks, such as periodic backup tasks. Scheduled tasks can periodically package and back up loose object data in the warehouse.
[0112] S404, In response to the backup command, the warehouse hosting system 30 compresses at least one object data in the main warehouse to generate a compressed file.
[0113] Specifically, the repository hosting system 30 can compress loose data in the main repository to generate compressed files. Loose data refers to object data that has not yet been integrated into an efficient storage structure. In a code development scenario, loose data can include new commit objects, tree objects, and binary large objects (BLOBs) created by the repository hosting system 30 when users perform code modifications and commit operations. These new commit objects, tree objects, and BLOBs are stored in a relatively independent and loosely structured manner in the object directory of the main repository.
[0114] In some possible implementations, the backup command can specify the object data to be backed up; for example, the backup command can include the key-value pair of the object data. Accordingly, the repository hosting system 30 can compress the object data identified by the key-value pair in the main repository based on the key-value pair of the backup command, thereby generating a compressed file in the main repository. The repository hosting system 30 can then write the compressed file to the main repository.
[0115] During compression, the repository hosting system 30 can compress at least one object data indicated by the backup command according to the compression algorithm configured by the user or the system default compression algorithm. The compression algorithm can be a high compression ratio algorithm or a lossless compression algorithm. For example, lossless compression algorithms may include LZ77, LZR, LZSS, LZMA, LZMA2, or DEFLATE algorithms. The LZ77 algorithm, as the basis for many lossless compression algorithms, uses a "sliding window" for data compression. Furthermore, the LZ77 algorithm manages a dictionary using triples including offset, run length, and deviation character. The dictionary is updated in real time when the file is parsed to reflect the latest compressed file and size. The LZR, LZSS, LZMA, and LZMA2 algorithms mentioned above are derivatives of the LZ77 algorithm and will not be elaborated further here. It should be noted that the above compression algorithm is a static compression algorithm. In practical applications, the warehouse hosting system 30 can also adopt dynamic compression algorithms based on deep learning, including compression algorithms based on multi-layer perceptron (MLP), compression algorithms based on convolutional neural networks (CNN), or compression algorithms based on generative adversarial networks (GAN).
[0116] S406, Warehouse hosting system 30 stores a copy of the compressed file in the copy storage medium corresponding to the main warehouse.
[0117] Specifically, the warehouse hosting system 30 can write a copy of the compressed file to the replica storage medium corresponding to the primary warehouse, thereby storing the copy of the compressed file in the replica storage medium. To ensure data consistency between the primary warehouse and the replica, the warehouse hosting system 30 can use a multiple-write approach to write copies of the compressed file to both the primary warehouse and the replica storage medium. The replica storage medium may include persistent media, cloud storage, or a distributed cache.
[0118] In some possible implementations, the repository hosting system 30 can display a configuration interface to the user, receiving the object copy storage directory configured by the user through the configuration interface. The object copy storage directory is used to indicate the path to the copy storage medium. The repository hosting system 30 can write multiple copies of the compressed files to the copy storage medium according to the object copy storage directory.
[0119] Furthermore, the warehouse hosting system 30 can receive the number of copies configured by the user through a configuration interface. Here, the number of copies is n, where n is a positive integer. Correspondingly, the warehouse hosting system 30 can store copies of the compressed file in n copy storage media corresponding to the main warehouse. In some possible implementations, the warehouse hosting system 30 can store one copy of the compressed file in each of the n copy storage media corresponding to the main warehouse, with each of the n copy storage media storing a portion of the compressed file. In other possible implementations, the warehouse hosting system 30 can store n copies of the compressed file in the n copy storage media corresponding to the main warehouse, with each of the n copy storage media storing one copy of the compressed file. In these implementations, the copies share reference data with the compressed file in the main warehouse. As shown in Figure 5, the primary repository includes reference data (refs), which points to at least one object data, specifically pack-54367f30cf.idx and pack-54367f30cf.pack. The replicas also contain the object data pack-54367f30cf.idx and pack-54367f30cf.pack. The object data pack-54367f30cf.idx and pack-54367f30cf.pack in the primary repository and the object data pack-54367f30cf.idx and pack-54367f30cf.pack in the replica storage media share the same reference data.
[0120] For ease of understanding, the following example illustrates a replica storage medium including a distributed cache (e.g., a distributed memory cache) and a persistent medium. Referring to Figure 6, a schematic diagram of a repository replica management method, users can request data from the repository or write data to the repository via an API request through a browser page. Alternatively, users can upload or download data through a client. Upload or download requests triggered by the user through the client are forwarded to the corresponding server (e.g., gitserver) through a proxy. During backup, compared to completely copying the repository's object data (e.g., xxx.pack) and reference data (e.g., packed-refs), in this application's solution, the repository hosting system 30 can write reference data and object data to the main repository. The object data can be a compressed file, and copies of the compressed file are written to the distributed memory cache and persistent medium via multiple writes. In this example, the medium where the main repository is located can be EFS01, and the medium where the replicas are located can be EFS02.
[0121] S408, Warehouse Hosting System 30 received a write success notification.
[0122] Specifically, when a copy of the compressed file is successfully written to the copy storage medium, the copy storage medium can return a write success notification to the repository hosting system 30.
[0123] It should be noted that S408 above is an optional step in the embodiments of this application, and the warehouse copy management method of this application may also be performed without executing S408 above.
[0124] S410, Warehouse Hosting System 30 receives the delete command.
[0125] The delete command is used to delete object data. Similar to the backup command, the delete command can be triggered by a user or by a system task. Figure 4 illustrates an example of a delete command triggered by a system task. For instance, a system task could delete a previous version of object data (such as a previous version of a compressed file) when a new version of the object data (such as a new version of a compressed file) is successfully written. A delete command triggered by a system task could be a command that deletes a previous version of a compressed file upon receiving a write success notification from the replica storage medium.
[0126] It should be noted that S410 is an optional step, and the warehouse copy management method of this application may or may not require S410.
[0127] S412, the repository hosting system 30 responds to the delete command by deleting the compressed file of the historical version in the main repository.
[0128] The delete command is used to delete specified object data. The delete command can include the key value of the object data to be deleted. The repository hosting system 30 can delete the compressed file uniquely identified by the key value of the object data indicated by the delete command in the main repository. For example, the repository hosting system 30 can delete historical versions of compressed files.
[0129] S414, Warehouse hosting system 30 deletes copies of historical versions of compressed files from the copy storage medium.
[0130] The repository hosting system 30 can synchronously delete copies of historical versions of compressed files on the replica storage media to ensure data consistency between the primary repository and the replicas. For example, the repository hosting system 30 can use transactions to ensure that deleting historical versions of compressed files in the primary repository and deleting copies of historical versions of compressed files on the replica storage media either succeeds simultaneously or fails simultaneously.
[0131] It should be noted that S412 is an optional step in the embodiments of this application, and the method of this application may also omit the execution of S412. For example, in a scenario requiring strong consistency, the warehouse hosting system 30 may execute S412 and S414. As another example, in a scenario requiring weak consistency, the warehouse hosting system 30 may execute S414 without having to execute S412.
[0132] S416, Warehouse hosting system 30 received a notification of successful deletion.
[0133] Specifically, when the replica storage medium successfully deletes the replica, the replica storage medium can return a deletion success notification to the warehouse hosting system 30.
[0134] It should be noted that the above S410 to S416 examples illustrate how updating object data or writing new object data, compressing the specified object data to generate a new version of the compressed file, and then deleting the historical version of the compressed file. In other possible implementations of this application embodiment, the user can also actively trigger the deletion of the compressed file; correspondingly, the deletion command can be a user-triggered deletion command. The repository hosting system 30 can respond to the deletion command by deleting the compressed file in the main repository and deleting a copy of the compressed file in the replica storage medium.
[0135] S418, Warehouse hosting system 30 returns a success message to the user.
[0136] The steps S408 to S418 described above are optional steps in the embodiments of this application. The warehouse copy management method of this application may also omit the steps S408 to S418.
[0137] Based on the above description, this application provides a repository copy management method. This method creates copies only for object data in the repository that occupies a large amount of storage space and remains unchanged, while keeping shared access to reference data that occupies a small amount of storage space and is frequently updated, thereby achieving object-level repository copying. This allows for the resolution of storage bottlenecks and the improvement of concurrency capabilities by expanding the object-level copies of the repository and utilizing technologies such as distributed memory caching.
[0138] The embodiment shown in Figure 4 describes the process of creating a copy. The process of reading a copy is described below with reference to the accompanying drawings.
[0139] Referring to Figure 7, a flowchart of a warehouse copy management method is shown. This method may include the following steps:
[0140] S702, Warehouse Hosting System 30 receives read commands.
[0141] Read commands are used to read object data. A read command can include the key-value pair of the object data to be read, indicating that the corresponding object data should be read. Read commands can be user-triggered.
[0142] S704, Warehouse hosting system 30 checks whether the compressed file to be read by the read command exists in the copy storage medium. If the check result is yes, then execute S706; if the check result is no, then execute S708.
[0143] The warehouse hosting system 30 can check whether the compressed file to be read by the read command exists in the replica storage medium based on the key value of the object data to be read in the read command and the key value of the data stored in the replica storage medium. If the key value carried in the read command is matched in the replica storage medium, it means that the compressed file to be read by the read command exists in the replica storage medium, and S706 can be executed to read a copy of the compressed file from the replica storage medium. If the key value carried in the read command is not matched in the replica storage medium, it means that the compressed file to be read by the read command does not exist in the replica storage medium, and S708 can be executed to read the compressed file from the main warehouse.
[0144] S706, Warehouse Hosting System 30 responds to a read command by reading a copy of the compressed file from the copy storage medium.
[0145] In practice, the warehouse hosting system 30 can read reference data from the main warehouse according to the read command, and then the warehouse hosting system 30 can read a copy of the compressed file from the copy storage medium according to the reference data.
[0146] S707, Warehouse Hosting System 30 returns a copy of the compressed file to the user.
[0147] S707 is an optional step in the embodiments of this application. The warehouse copy management method of this application may also omit S707.
[0148] S708, Warehouse Hosting System 30 reads compressed files from the main warehouse.
[0149] The S709 library hosting system 30 returns a compressed file to the user.
[0150] In particular, step S709 of this embodiment is optional; the repository copy management method of this application may not require execution of step S709.
[0151] Based on the above description, this application provides a repository replica management method. This method constructs object-level repository replicas. Object-level repository replicas do not need to concern themselves with the complex relationships between repository reference data and object data. Repository data is split into reference data and object data, and object data can be stored in fixed storage or cache through multiple write operations. When a read is needed, reference data can be read from the main repository, and object data can be read from the replica. The object-level repository replica synchronization and consistency algorithm is simple, with no download synchronization latency, which can improve download performance. Furthermore, the low latency of object-level repository replicas can meet the needs of API scenarios with high latency requirements.
[0152] Based on the aforementioned warehouse copy management method, this application also provides a warehouse hosting system 30. The warehouse hosting system 30 is described below from the perspective of functional modularization.
[0153] Referring to Figure 3, a schematic diagram of a warehouse management system 30 is shown. The warehouse management system 30 is used to manage users' warehouses and includes:
[0154] Communication module 302 is used to receive a backup command, the backup command being used to back up the user's main repository, the main repository including at least one object data and reference data of the at least one object data, the reference data being used to point to the at least one object data;
[0155] The copy creation module 304 is used to compress at least one object data in the main repository in response to the backup command, and generate a compressed file;
[0156] The copy creation module 304 is also used to store a copy of the compressed file in the copy storage medium corresponding to the main repository;
[0157] The copy shares the reference data with the compressed file in the main repository.
[0158] For example, the communication module 302 and the copy creation module 304 described above can be implemented in hardware or in software.
[0159] When implemented in software, the communication module 302 and the replica creation module 304 can be applications running on the computing device. For example, the replica creation module 304 can be a computing engine running on the computing device. These applications can also be virtualized and provided to users as virtualization services. Virtualization services can include virtual machine (VM) services, bare metal server (BMS) services, or container services. Specifically, a VM service can be a service that uses virtualization technology to create a pool of virtual machine (VM) resources on multiple physical hosts, providing VMs to users on demand. A BMS service is a service that uses virtualization technology to create a pool of BMS resources on multiple physical hosts, providing BMS services to users on demand. A container service is a service that uses virtualization technology to create a pool of container resources on multiple physical hosts, providing containers to users on demand. A VM is a simulated virtual computer, that is, a logical computer. A BMS is a scalable, high-performance computing service with computing performance indistinguishable from traditional physical machines, and features secure physical isolation. Containers are a kernel virtualization technology that provides lightweight virtualization to isolate user space, processes, and resources. It should be understood that the VM service, BMS service, and container service mentioned above are merely specific examples. In practical applications, virtualization services can also include other lightweight or heavyweight virtualization services, which are not specifically limited here.
[0160] When implemented in hardware, the communication module 302 and the copy creation module 304 may include at least one computing device, such as a server. Alternatively, the communication module 302 and the copy creation module 304 may also be devices implemented using application-specific integrated circuits (ASICs) or programmable logic devices (PLDs). The PLD may be a complex programmable logical device (CPLD), a field-programmable gate array (FPGA), generic array logic (GAL), or any combination thereof.
[0161] In some possible implementations, the warehouse hosting system 30 also includes:
[0162] Display module 306 is used to display the configuration interface to the user;
[0163] The communication module 302 is also used to receive the object copy storage directory configured by the user through the configuration interface, wherein the object copy storage directory is used to indicate the path of the copy storage medium;
[0164] The copy creation module 304 is specifically used for:
[0165] According to the object copy storage directory, a copy of the compressed file is written to the copy storage medium.
[0166] Similarly, the aforementioned display module 306 can be implemented in hardware or software. The display module 306 can be a module on the user device side. When the display module 306 is implemented in software, it can be a module in a client or browser. When the display module 306 is implemented in hardware, it can be a display.
[0167] In some possible implementations, the replica storage medium includes persistent media, cloud storage, or distributed cache.
[0168] In some possible implementations, the backup command includes a backup command triggered by a system task or a backup command triggered by a user.
[0169] In some possible implementations, the communication module 302 is further configured to:
[0170] Receive read command;
[0171] Warehouse hosting system 30 also includes:
[0172] The reading module 301 is configured to read a copy of the compressed file from the copy storage medium in response to the reading command.
[0173] Similarly, the aforementioned reading module 301 can be implemented in hardware or in software.
[0174] When implemented in software, the reading module 301 can be an application running on a computing device. This application can also be virtualized and provided to users as virtualization services such as BMS, VM, or container. When implemented in hardware, the reading module 301 can include at least one computing device, such as a server. Alternatively, the reading module 301 can also be a device implemented using an ASIC or a PLD.
[0175] In some possible implementations, the reading module 301 is further configured to:
[0176] Read the reference data from the main repository;
[0177] The reading module 301 is specifically used for:
[0178] A copy of the compressed file is read from the copy storage medium according to the reference data.
[0179] In some possible implementations, the reading module 301 is further configured to:
[0180] If the compressed file is not present in the copy storage medium, the compressed file is read from the main repository.
[0181] In some possible implementations, the communication module 302 is further configured to:
[0182] The communication module is used to receive delete commands;
[0183] Warehouse hosting system 30 also includes:
[0184] The deletion module 303 is configured to, in response to the deletion command, delete the compressed file in the main repository and delete a copy of the compressed file in the copy storage medium.
[0185] Similarly, the deletion module 303 described above can be implemented in hardware or in software.
[0186] When implemented in software, the deletion module 303 can be an application running on a computing device. This application can also be virtualized and provided to users as virtualization services such as BMS, VM, or container. When implemented in hardware, the deletion module 303 can include at least one computing device, such as a server. Alternatively, the reading module 301 can also be a device implemented using an ASIC or PLD.
[0187] In some possible implementations, the communication module 302 is further configured to:
[0188] The system receives the number of replicas configured by the user through the configuration interface, where the number of replicas is n, and n is a positive integer.
[0189] The copy creation module 304 is specifically used for:
[0190] The compressed file is stored in n copy storage media corresponding to the main repository.
[0191] In some possible implementations, the replica creation module 304 is specifically used for:
[0192] One copy of the compressed file is stored in one of the n replica storage media corresponding to the main repository, and each of the n replica storage media stores a portion of the compressed file; or...
[0193] The compressed file is stored in n copies of the storage medium corresponding to the main repository, and each of the n copies of the storage medium stores one copy of the compressed file.
[0194] This application also provides a computing device 800. As shown in FIG8, the computing device 800 includes: a bus 802, a processor 804, a memory 806, and a communication interface 808. The processor 804, the memory 806, and the communication interface 808 communicate with each other via the bus 802. The computing device 800 may be a server or a terminal device. It should be understood that this application does not limit the number of processors and memories in the computing device 800.
[0195] Bus 802 can be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus, etc. Buses can be categorized as address buses, data buses, control buses, etc. For ease of illustration, only one line is used in Figure 8, but this does not imply that there is only one bus or one type of bus. Bus 802 can include pathways for transmitting information between various components of computing device 800 (e.g., memory 806, processor 804, communication interface 808).
[0196] Processor 804 may include any one or more processors such as a central processing unit (CPU), a graphics processing unit (GPU), a microprocessor (MP), or a digital signal processor (DSP).
[0197] The memory 806 may include volatile memory, such as random access memory (RAM). The memory 806 may also include non-volatile memory, such as read-only memory (ROM), flash memory, hard disk drive (HDD), or solid-state drive (SSD). The memory 806 stores executable program code, which the processor 804 executes to implement the aforementioned warehouse copy management method. Specifically, the memory 806 stores instructions for the warehouse hosting system 30 to execute the warehouse copy management method. For example, the memory 806 may store instructions for implementing the functions of the communication module 302 and the copy creation module 304. Furthermore, the memory may also store instructions for implementing the functions of the display module 306, the reading module 301, the deletion module 303, and the task management module 305.
[0198] The communication interface 808 uses transceiver modules such as, but not limited to, network interface cards and transceivers to enable communication between the computing device 800 and other devices or communication networks.
[0199] This application also provides a computing device cluster. The computing device cluster includes at least one computing device. The computing device can be a server, such as a central server, an edge server, or a local server in a local data center. In some embodiments, the computing device can also be a terminal device such as a desktop computer, a laptop computer, or a smartphone.
[0200] As shown in Figure 9, the computing device cluster includes at least one computing device 800. The memory 806 of one or more computing devices 800 in the computing device cluster may store the same instructions from the repository hosting system 30 for executing repository copy management methods.
[0201] In some possible implementations, one or more computing devices 800 in the computing device cluster can also be used to execute some of the instructions of the warehouse hosting system 30 for performing the warehouse copy management method. In other words, a combination of one or more computing devices 800 can jointly execute the instructions of the warehouse hosting system 30 for performing the warehouse copy management method.
[0202] It should be noted that the memory 806 in different computing devices 800 in the computing device cluster can store different instructions for executing some functions of the warehouse hosting system 30.
[0203] Figure 10 illustrates one possible implementation. As shown in Figure 10, two computing devices 800A and 800B are connected via a communication interface 808. The memory in computing device 800A stores instructions for executing the functions of the communication module 302 and the copy creation module 304. The memory in computing device 800B stores instructions for executing the functions of the display module 306. Furthermore, the memory in computing device 800A may also store instructions for executing the functions of the read module 301, the delete module 303, and the task management module 305. In other words, the memory 806 of computing devices 800A and 800B jointly stores the instructions used by the warehouse hosting system 30 to execute the warehouse copy management method.
[0204] The connection method between the computing device clusters shown in Figure 10 can be such that, considering the warehouse copy management method provided in this application supports visual configuration of the object copy storage directory, the functions implemented by the display module 306 are delegated to an independent computing device, such as computing device 800B. Computing device 800B can be a terminal or other device with display capabilities.
[0205] It should be understood that the functions of computing device 800A shown in Figure 10 can also be performed by multiple computing devices 800. Similarly, the functions of computing device 800B can also be performed by multiple computing devices 800.
[0206] In some possible implementations, one or more computing devices in a computing device cluster can be connected via a network. This network can be a wide area network (WAN) or a local area network (LAN), etc. Figure 11 illustrates one possible implementation. As shown in Figure 11, two computing devices 800C and 800D are connected via a network. Specifically, they are connected to the network through communication interfaces in each computing device. In this type of possible implementation, the memory 806 in computing device 800C stores instructions for executing the functions of the communication module 302 and the copy creation module 304. Simultaneously, the memory 806 in computing device 800D stores instructions for executing the functions of the display module 306. Optionally, the memory in computing device 800A may also store instructions for executing the functions of the read module 301, the delete module 303, and the task management module 305.
[0207] The connection method between the computing device clusters shown in Figure 11 can be considered in light of the fact that the warehouse copy management method provided in this application supports visual configuration of the object copy storage directory. Therefore, it is considered that the functions implemented by the display module 306 are executed by an independent computing device such as computing device 800D.
[0208] It should be understood that the functions of the computing device 800C shown in Figure 11 can also be performed by multiple computing devices 800. Similarly, the functions of the computing device 800D can also be performed by multiple computing devices 800.
[0209] This application also provides a computer-readable storage medium. The computer-readable storage medium can be any available medium that a computing device can store, or a data storage device such as a data center containing one or more available media. The available medium can be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid-state drive). The computer-readable storage medium includes instructions that instruct the computing device to execute the aforementioned warehouse copy management method applied to the warehouse hosting system 30.
[0210] This application also provides a computer program product containing instructions. The computer program product may be a software or program product containing instructions, capable of running on a computing device or stored on any usable medium. When the computer program product is run on at least one computing device, it causes the at least one computing device to perform the aforementioned repository copy management method.
[0211] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the protection scope of the technical solutions of the embodiments of the present invention.
Claims
1. A warehouse copy management method characterized by, Applied to a warehouse hosting system for managing users' warehouses, the method includes: Receive a backup command, the backup command being used to back up the user's main repository, the main repository including at least one object data and reference data of the at least one object data, the reference data being used to point to the at least one object data; In response to the backup command, at least one object data in the main repository is compressed to generate a compressed file; A copy of the compressed file is stored in the copy storage medium corresponding to the main repository; The copy shares the reference data with the compressed file in the main repository.
2. The method of claim 1, wherein, The method further includes: Display the configuration interface to the user; Receive the object replica storage directory configured by the user through the configuration interface, wherein the object replica storage directory is used to indicate the path of the replica storage medium; Storing a copy of the compressed file in the copy storage medium corresponding to the main repository includes: According to the object copy storage directory, a copy of the compressed file is written to the copy storage medium.
3. The method according to claim 1 or 2, characterized in that, The replica storage medium includes persistent media, cloud storage, or distributed cache.
4. The method according to any one of claims 1 to 3, characterized in that, The backup commands include backup commands triggered by system tasks or backup commands triggered by users.
5. The method according to any one of claims 1 to 4, characterized in that, The method further includes: Receive read command; In response to the read command, a copy of the compressed file is read from the copy storage medium.
6. The method of claim 5, wherein, The method further includes: Read the reference data from the main repository; The step of reading a copy of the compressed file from the copy storage medium includes: A copy of the compressed file is read from the copy storage medium according to the reference data.
7. The method of claim 5, wherein, The method further includes: If the compressed file is not present in the copy storage medium, the compressed file is read from the main repository.
8. The method according to any one of claims 1 to 7, characterized in that, The method further includes: Receive delete command; In response to the deletion command, the compressed file is deleted in the main repository, and a copy of the compressed file is deleted in the copy storage medium.
9. The method according to any one of claims 1 to 8, characterized in that, The method further includes: The system receives the number of replicas configured by the user through the configuration interface, where the number of replicas is n, and n is a positive integer. The step of storing a copy of the compressed file in the copy storage medium corresponding to the main repository includes: The compressed file is stored in n copy storage media corresponding to the main repository.
10. The method of claim 9, wherein, The step of storing copies of the compressed file in the n replica storage media corresponding to the main repository includes: One copy of the compressed file is stored in one of the n replica storage media corresponding to the main repository, and each of the n replica storage media stores a portion of the compressed file; or... The compressed file is stored in n copies of the storage medium corresponding to the main repository, and each of the n copies of the storage medium stores one copy of the compressed file.
11. A warehouse management system characterized by, The warehouse hosting system is used to manage users' warehouses, and the warehouse hosting system includes: A communication module is used to receive a backup command, which is used to back up the user's main repository. The main repository includes at least one object data and reference data of the at least one object data, and the reference data is used to point to the at least one object data. A copy creation module is used to compress at least one object data in the main repository in response to the backup command, and generate a compressed file; The copy creation module is also used to store a copy of the compressed file in the copy storage medium corresponding to the main repository; The copy shares the reference data with the compressed file in the main repository.
12. The system of claim 11, wherein, The system also includes: The display module is used to display the configuration interface to the user; The communication module is further configured to receive the object replica storage directory configured by the user through the configuration interface, wherein the object replica storage directory is used to indicate the path of the replica storage medium; The copy creation module is specifically used for: According to the object copy storage directory, a copy of the compressed file is written to the copy storage medium.
13. The system of claim 11 or 12, wherein, The replica storage medium includes persistent media, cloud storage, or distributed cache.
14. The system of any one of claims 11 to 13, wherein, The backup commands include backup commands triggered by system tasks or backup commands triggered by users.
15. The system of any one of claims 11 to 14, wherein, The communication module is also used for: Receive read command; The system also includes: A read module is configured to read a copy of the compressed file from the copy storage medium in response to the read command.
16. The system of claim 15, wherein, The reading module is also used for: Read the reference data from the main repository; The reading module is specifically used for: A copy of the compressed file is read from the copy storage medium according to the reference data.
17. The system of claim 15, wherein, The reading module is also used for: If the compressed file is not present in the copy storage medium, the compressed file is read from the main repository.
18. The system of any one of claims 11 to 17, wherein, The communication module is also used for: The communication module is used to receive delete commands; The system also includes: A deletion module is configured to, in response to the deletion command, delete the compressed file in the main repository and delete a copy of the compressed file in the copy storage medium.
19. The system of any one of claims 11 to 18, wherein, The communication module is also used for: The system receives the number of replicas configured by the user through the configuration interface, where the number of replicas is n, and n is a positive integer. The copy creation module is specifically used for: The compressed file is stored in n copy storage media corresponding to the main repository.
20. The system of claim 19, wherein, The copy creation module is specifically used for: A copy of the compressed file is stored in one of the n replica storage media corresponding to the main repository, and each of the n replica storage media stores a portion of the compressed file; or, The compressed file is stored in n copies of the storage medium corresponding to the main repository, and each of the n copies of the storage medium stores one copy of the compressed file.
21. A computing device cluster, characterized in that, The computing device cluster includes at least one computing device, the at least one computing device including at least one processor and at least one memory, the at least one memory storing computer-readable instructions; the at least one processor executes the computer-readable instructions to cause the computing device cluster to perform the warehouse copy management method as described in any one of claims 1 to 10.
22. A computer-readable storage medium, characterized in that, Includes computer-readable instructions; the computer-readable instructions are used to implement the warehouse copy management method according to any one of claims 1 to 10.
23. A computer program product, characterised in that, Includes computer-readable instructions; the computer-readable instructions are used to implement the warehouse copy management method according to any one of claims 1 to 10.