Hybrid encoding-based binary large object storage repair method and apparatus

By employing a hybrid coding strategy and grouped parallel repair technology, this method addresses the performance bottlenecks in repairing small blobs and the low efficiency in repairing large blobs in existing erasure coding methods. It achieves efficient binary large object storage repair and optimizes repair bandwidth and read performance.

WO2026129695A1PCT designated stage Publication Date: 2026-06-25HUAZHONG UNIV OF SCI & TECH

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
HUAZHONG UNIV OF SCI & TECH
Filing Date
2025-08-19
Publication Date
2026-06-25

AI Technical Summary

Technical Problem

Existing erasure coding methods suffer from performance bottlenecks in repairing small blobs, insufficient repair bandwidth, and low efficiency in parallel repair when handling large binary objects of different sizes. In particular, they have failed to effectively solve the problems of uneven repair bandwidth and parallelism in high-density distributed storage systems.

Method used

A hybrid coding strategy is adopted, which performs non-systematic minimum storage regeneration (MSR) coding on small blobs and systematic MSR coding on large blobs. Furthermore, by using grouped parallel repair technology and dynamic load balancing strategy, the repair bandwidth and read efficiency are optimized, and the load among storage nodes is balanced.

Benefits of technology

It improves the repair bandwidth for small blobs and the reading efficiency for large blobs, enhances repair efficiency through parallel repair technology, avoids disk I/O overload, and balances the load among storage nodes.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN2025115500_25062026_PF_FP_ABST
    Figure CN2025115500_25062026_PF_FP_ABST
Patent Text Reader

Abstract

The present application relates to the technical field of computer storage, and specifically discloses a hybrid encoding-based binary large object (BLOB) storage repair method and apparatus. The method comprises: acquiring a plurality of BLOBs, performing non-systematic minimum storage regeneration (MSR) encoding on a small BLOB among the plurality of BLOBs, and performing systematic MSR encoding on a large BLOB among the plurality of BLOBs; storing the plurality of encoded BLOBs in a plurality of storage nodes in a balanced manner; and on the basis of the type of a BLOB to be repaired, determining a corresponding repair strategy, and on the basis of the repair strategy and the plurality of encoded BLOBs in the plurality of storage nodes, using a group-based parallel repair technique to repair the BLOB to be repaired. The method can improve BLOB repair efficiency.
Need to check novelty before this filing date? Find Prior Art

Description

Binary Large Object Storage Repair Method and Apparatus Based on Hybrid Encoding [Technical Field]

[0001] This application belongs to the field of computer storage technology, and more specifically, relates to a binary large object storage repair method and apparatus based on hybrid encoding. [Background Technology]

[0002] With the rapid development of cloud computing, big data, and distributed storage technologies, distributed storage systems have been widely used to store massive amounts of binary large objects (Blobs). These Blobs require erasure coding techniques during storage to improve the system's fault tolerance and data reliability. However, existing erasure coding methods still face some performance bottlenecks during repair and retrieval, especially when processing Blob data of different sizes.

[0003] 1) Performance bottleneck in small blob repair: In existing erasure coding repair, the traditional systematic minimum storage regeneration (MSR) coding is inefficient for repairing small blobs, especially since the repair process requires frequent non-continuous I / O access, resulting in a significant decrease in repair bandwidth and read performance.

[0004] 2) Bottleneck in repairing large blobs: For large blobs, although systematic MSR encoding can ensure efficient regular reading performance, the repair efficiency is low due to its high repair bandwidth and node I / O operation overhead.

[0005] 3) Parallel repair challenges: Existing repair methods fail to fully utilize the parallel processing capabilities of storage systems, resulting in uneven disk I / O load during the repair process, and the parallelism and efficiency of the repair process cannot be fully improved.

[0006] Therefore, existing technologies have failed to effectively solve the performance problems of repairing uneven bandwidth and parallel repair in high-density, distributed storage systems, especially in scenarios where storage is mainly composed of small blobs. [Summary of the Invention]

[0007] In view of the shortcomings of the prior art, the purpose of this application is to provide a binary large object storage repair method and apparatus based on hybrid encoding, which aims to solve the problems of insufficient bandwidth and low repair efficiency of existing repair schemes for small blobs.

[0008] To achieve the above objectives, in a first aspect, this application provides a binary large object storage repair method based on hybrid encoding, comprising:

[0009] Retrieve multiple binary large object blobs, perform non-systematic minimal storage regeneration MSR encoding on the smaller blobs among the multiple blobs, and perform systematic MSR encoding on the larger blobs among the multiple blobs.

[0010] The encoded blobs are evenly distributed across multiple storage nodes;

[0011] Based on the type of the Blob to be repaired, a corresponding repair strategy is determined, and based on the repair strategy and the encoded Blobs in the multiple storage nodes, a grouped parallel repair technique is used to repair the Blob to be repaired.

[0012] This application employs a hybrid encoding strategy based on blob size. For small blobs, non-systematic MSR encoding is used to optimize repair bandwidth and reduce I / O amplification effects, while for large blobs, systematic MSR encoding is used to ensure read efficiency. Parallel repair technology is also used to improve repair efficiency.

[0013] According to the binary large object storage repair method based on hybrid encoding provided in this application, the non-systematic minimum storage regeneration (MSR) encoding of small blobs among multiple blobs includes:

[0014] For small blocks with inter-block locality within a small blob, a merge-split-encode scheme is used for MSR encoding;

[0015] For small blocks with intra-block locality in small blobs, a split-merge-encode scheme is used for MSR encoding.

[0016] This application splits and re-merges small blob data blocks based on internal locality to form optimized encoded stripes. In this way, reading any blob can decode only the same number of bytes as the blob from the parity block without read amplification.

[0017] According to the binary large object storage repair method based on hybrid encoding provided in this application, the systematic MSR encoding of large blobs among multiple blobs includes:

[0018] The large blob is divided into multiple blocks of fixed size;

[0019] The multiple fixed-size blocks are systematically MSR encoded.

[0020] This application divides large blobs into blocks of fixed size and performs systematic MSR encoding, so that the original data blocks and redundant data blocks in the encoded strip are evenly distributed across multiple nodes.

[0021] According to the binary large object storage repair method based on hybrid encoding provided in this application, the step of determining the corresponding repair strategy based on the type of the Blob to be repaired includes:

[0022] If the Blob to be repaired is a small Blob, the repair strategy is determined to be a rotational selection strategy, which selects data from the least used sub-blocks for repair.

[0023] If the blob to be repaired is a large blob, the repair strategy is to select the least cost path for repair based on the repair rules of the systematic MSR encoding.

[0024] This application utilizes a dynamic load balancing strategy during the repair process, employing rotational selection and grouped parallel repair methods to balance the load among storage nodes and avoid disk I / O overload.

[0025] According to the binary large object storage repair method based on hybrid encoding provided in this application, after the encoded multiple blobs are evenly stored in multiple storage nodes, the method further includes:

[0026] When reading small blobs, the reading path is optimized based on locality analysis;

[0027] When reading large blobs, data blocks are accessed directly through systematic MSR encoding.

[0028] When reading small blobs, this application optimizes the reading path based on locality analysis, directly accesses relevant data blocks, and reduces the reading of redundant data; when reading large blobs, it directly accesses data blocks through systematic MSR encoding to ensure efficient reading performance.

[0029] According to the binary large object storage repair method based on hybrid encoding provided in this application, the sub-block fragmentation technology is used for parallel decoding when reading the Blob.

[0030] This application employs sub-block fragmentation technology for parallel decoding during reading to balance node load and improve reading performance.

[0031] Secondly, this application provides a binary large object storage repair device based on hybrid encoding, comprising:

[0032] The encoding module is used to obtain multiple binary large object blobs, and to perform non-systematic minimum storage regeneration (MSR) encoding on the smaller blobs among the multiple blobs, and systematic MSR encoding on the larger blobs among the multiple blobs.

[0033] The storage module is used to evenly store the encoded blobs across multiple storage nodes;

[0034] The repair module is used to determine the corresponding repair strategy based on the type of the Blob to be repaired, and to repair the Blob to be repaired using a grouped parallel repair technique based on the repair strategy and the encoded Blobs in the multiple storage nodes.

[0035] Thirdly, this application provides an electronic device, comprising: at least one memory for storing a program; and at least one processor for executing the program stored in the memory, wherein when the program stored in the memory is executed, the processor is configured to execute the binary large object storage repair method based on hybrid encoding described in the first aspect or any possible implementation thereof.

[0036] Fourthly, this application provides a computer-readable storage medium storing a computer program that, when run on a processor, causes the processor to execute the binary large object storage repair method based on hybrid encoding as described in the first aspect or any possible implementation of the first aspect.

[0037] Fifthly, this application provides a computer program product that, when run on a processor, causes the processor to execute the binary large object storage repair method based on hybrid encoding described in the first aspect or any possible implementation of the first aspect.

[0038] It is understood that the beneficial effects of the second to sixth aspects mentioned above can be found in the relevant descriptions in the first aspect mentioned above, and will not be repeated here.

[0039] Overall, the technical solutions conceived in this application have the following beneficial effects compared with the prior art:

[0040] (1) Based on the size of the Blob, a hybrid encoding strategy is adopted. On small blobs, non-systematic MSR encoding is used to optimize the repair bandwidth and reduce the I / O amplification effect, while on large blobs, systematic MSR encoding is used to ensure reading efficiency. Parallel repair technology is used to improve the repair efficiency.

[0041] (2) During the repair process, a dynamic load balancing strategy is used to balance the load between storage nodes by using rotational selection and grouped parallel repair methods, thereby avoiding disk I / O overload. [Attached Image Description]

[0042] To more clearly illustrate the technical solutions in this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0043] Figure 1 is a flowchart illustrating the binary large object storage repair method based on hybrid encoding provided in an embodiment of this application;

[0044] Figure 2 is a schematic diagram of the merging-splitting-encoding scheme provided in an embodiment of this application;

[0045] Figure 3 is a schematic diagram of the split-merge-encoding scheme provided in the embodiments of this application;

[0046] Figure 4 is a schematic diagram of the grouped parallel repair provided in an embodiment of this application;

[0047] Figure 5 is a schematic diagram of the structure of the binary large object storage repair device based on hybrid encoding provided in an embodiment of this application;

[0048] Figure 6 is a schematic diagram of the structure of the electronic device provided in an embodiment of this application.

Detailed Implementation Methods

[0049] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application.

[0050] In this article, the term "and / or" describes the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent: A existing alone, A and B existing simultaneously, or B existing alone. The symbol " / " in this article indicates that the related objects are in an "or" relationship; for example, A / B means A or B.

[0051] In the embodiments of this application, the terms "exemplary" or "for example" are used to indicate that something is an example, illustration, or description. Any embodiment or design that is described as "exemplary" or "for example" in the embodiments of this application should not be construed as being more preferred or advantageous than other embodiments or design. Specifically, the use of the terms "exemplary" or "for example" is intended to present the relevant concepts in a specific manner.

[0052] In the description of the embodiments of this application, unless otherwise stated, "multiple" means two or more, for example, multiple processing units means two or more processing units, multiple elements means two or more elements, etc.

[0053] Next, the binary large object storage repair method based on hybrid encoding provided in the embodiments of this application will be introduced with reference to Figures 1-5.

[0054] Figure 1 is a flowchart illustrating the binary large object storage repair method based on hybrid encoding provided in an embodiment of this application. As shown in Figure 1, the method includes the following steps:

[0055] Step 100: Obtain multiple binary large object blobs, and perform non-systematic minimum storage regeneration MSR encoding on the small blobs among the multiple blobs, and perform systematic MSR encoding on the large blobs among the multiple blobs.

[0056] The original data block is a data unit directly segmented from the file, while the check block is a redundant data block generated by an encoding algorithm. The existence of the check block allows the system to reconstruct lost data even if some data blocks are lost.

[0057] In a high-density server based on erasure coding RS(n,k), a file is divided into k raw data blocks. These raw data blocks are encoded into n total data blocks using an encoding matrix, where n = m + k, and m is the number of parity blocks. The set of n total data blocks is called a stripe. The n data blocks are distributed across n disks, tolerating any nk disk failure. When any one of the n data blocks fails, data needs to be read from the other surviving disks to repair it. Specifically, k data blocks need to be read; these data blocks can be either raw data blocks or parity blocks. Then, the erasure coding decoding algorithm uses these k data blocks to repair the failed data block.

[0058] This application first receives Blob data of different sizes and determines whether it is a small Blob or a large Blob based on its size. Then, for small Blobs, non-systematic MSR encoding is used to optimize repair bandwidth based on the principle of locality of access. For large Blobs, systematic MSR encoding is used to maintain efficient regular reading performance.

[0059] MSR is a type of erasure coding scheme, primarily used in data storage systems to improve data reliability and fault tolerance. Erasure coding divides data into multiple segments and adds redundant segments, allowing the original data to be recovered from the remaining segments even if some data segments are lost or corrupted.

[0060] Optionally, a Blob can be classified as a small Blob or a large Blob based on a preset threshold and the size of the Blob. If the size of the Blob is greater than the preset threshold, it is considered a large Blob; if the size of the Blob is less than or equal to the preset threshold, it is considered a small Blob.

[0061] Optionally, by analyzing the single-block repair time under Clay encoding, including network transmission, disk read and seek time, and computation time, it can be found that when the Blob size is small, disk seek time dominates, and the threshold is determined to be the preset threshold.

[0062] Specifically, the single-block repair time of Clay encoding can be obtained by T = TN + TD + TS + TC, where TN is the data transmission time on the network, TD is the disk read time, TS is the disk seek time, and TC is the calculation time.

[0063] For data transmission time, in the (n, k)Clay encoding, each of the n-1 helper nodes transmits 1 / (n-k) blocks to the requester, and the repair bandwidth is (n-1)B / (n-k)k. Therefore, the data transmission time is TN = (n-1)B / (n-k)kw. For disk read and seek time, assuming the I / O throughput of sequentially reading data from the disk is θ and the average disk seek time is s, we can obtain TD = (n-1)B / (n-k)kθ and TS = λs, where λ is the number of I / O operations. Finally, the single block repair time is T = (n-1)B / (n-k)kw + (n-1)B / (n-k)kθ + λs + TC.

[0064] Blob size refers to the storage space occupied by the blob. When T starts to be dominated by TS, it means that in the total repair time T, disk seek time TS becomes the largest component, that is, it accounts for the main proportion of the total repair time. Therefore, the blob size can be adjusted from large to small until the T of the corresponding blob size starts to be dominated by TS, and then that size is determined as the threshold.

[0065] Alternatively, the systematic MSR encoding can be Clay encoding, etc.

[0066] Step 110: Distribute the encoded blobs evenly across multiple storage nodes;

[0067] After encoding the Blob, the encoded data is distributed across multiple storage nodes in stripe form to ensure balanced data distribution and improve storage reliability and repair efficiency.

[0068] Step 120: Based on the type of the Blob to be repaired, determine the corresponding repair strategy, and based on the repair strategy and the encoded Blobs in the multiple storage nodes, use the grouped parallel repair technique to repair the Blob to be repaired.

[0069] When data corruption occurs in the system, a repair operation is triggered. Different repair strategies are adopted for different types of blobs, and the grouped parallel repair technology is used to execute repair tasks in parallel across multiple storage nodes to improve repair efficiency.

[0070] The parallel repair technique uses grouping, which decomposes the block to be repaired into multiple sub-blocks. By leveraging the parallel capabilities of storage nodes, multiple sub-blocks are repaired simultaneously across multiple nodes, and the repair efficiency is improved through a dynamic task allocation algorithm.

[0071] This application provides a binary large object storage repair method based on hybrid encoding. It adopts a hybrid encoding strategy based on the size of the blob. For small blobs, non-systematic MSR encoding is used to optimize repair bandwidth and reduce I / O amplification effect, while for large blobs, systematic MSR encoding is used to ensure read efficiency. Furthermore, parallel repair technology is used to improve repair efficiency.

[0072] In some embodiments, step 100, which involves performing non-systematic Minimal Storage Regeneration (MSR) encoding on smaller blobs among a plurality of blobs, specifically includes:

[0073] For small blocks with inter-block locality within a small blob, a merge-split-encode scheme is used for MSR encoding;

[0074] For small blocks with intra-block locality in small blobs, a split-merge-encode scheme is used for MSR encoding.

[0075] For small blobs, non-systematic MSR encoding is used, which splits and re-merges blob data blocks based on internal locality to form optimized encoding stripes.

[0076] Specifically, different non-systematic MSR encoding methods are used to address intra-block locality and inter-block locality. Intra-block locality means that a small blob itself is typically accessed as a whole, like a photograph. In other words, if any byte of the blob is read, the rest should be prefetched. Inter-block locality means that multiple blobs are typically accessed together. Specifically, these blobs often share the same attributes in their metadata, such as user ID or application ID.

[0077] Figure 2 is a schematic diagram of the merge-split-encode scheme provided in an embodiment of this application. As shown in Figure 2, for data groups with strong inter-blob locality, multiple blobs are merged and then encoded to reduce the performance overhead caused by access dispersion. The merge-split-encode scheme first merges multiple small blobs with inter-blob locality into a fixed-size group, for example, 4MB by default; then splits the group into k data blocks; finally, it encodes the k data blocks into n parity blocks using (n,k) non-systematic MSR encoding. In this way, when reading any blob in the group, the NCBlob needs to read k parity blocks from the disk into the Dynamic Random Access Memory (DRAM) to reconstruct all blobs in the group, but all remaining blobs except the currently read blob will also be read immediately due to inter-blob locality. This ensures that the remaining blobs can be read directly from DRAM instead of the disk, thereby reducing read amplification.

[0078] Figure 3 is a schematic diagram of the split-merge-encode scheme provided in an embodiment of this application. As shown in Figure 3, the split-merge-encode scheme first splits a small blob into k sub-blobs of equal size; then merges the k sub-blobs of multiple blobs into k data blocks; finally, encodes the k data blocks into n parity blocks using (n,k) non-systematic MSR encoding. In this way, reading any blob can decode only the same number of bytes as the blob from the parity blocks without read amplification.

[0079] Specifically, the three blocks (Blob1, Blob2, and Blob3) are first divided into k = 4 sub-blocks, and then these sub-blocks are merged into data blocks. For example, the first data block 1 is formed by merging the first sub-blocks of all three blocks. Finally, these four data blocks are encoded into six parity blocks using a (6,4) unsystematic MSR code. Reading Blob1 only requires reading the first byte segment from each parity block to decode Blob1, where the segment size is equal to the size of the sub-block of Blob1. Thus, reading Blob1 only transfers the size of Blob1 and does not cause read amplification.

[0080] Preferably, a merge-split-encode scheme is first used to encode multiple small blocks that have inter-block locality, and then a split-merge-encode scheme is used for the remaining small blocks that usually have intra-block locality.

[0081] In some embodiments, step 100, which involves systematically encoding the large Blob among multiple Blobs, specifically includes:

[0082] The large blob is divided into multiple blocks of fixed size;

[0083] The multiple fixed-size blocks are systematically MSR encoded.

[0084] Large blobs are divided into blocks of fixed size and systematically encoded using MSR, so that the original data blocks and redundant data blocks in the encoded strips are evenly distributed across multiple nodes.

[0085] In some embodiments, step 120, determining the corresponding repair strategy based on the type of the Blob to be repaired, specifically includes:

[0086] If the Blob to be repaired is a small Blob, the repair strategy is determined to be a rotational selection strategy, which selects data from the least used sub-blocks for repair.

[0087] If the blob to be repaired is a large blob, the repair strategy is to select the least cost path for repair based on the repair rules of the systematic MSR encoding.

[0088] For both large and small blobs, a generalizable rotation-based sub-block selection repair scheme can be chosen. If the blob to be repaired is a small blob, the repair strategy is determined to be a rotation-based selection strategy, which selects data from the least used sub-blocks for repair. If the blob to be repaired is a large blob, the repair strategy is determined to be a repair strategy based on the repair rules of the systematic MSR encoding, which selects the minimum cost path for repair.

[0089] Each storage node takes turns selecting different sub-blocks to ensure the sustainability of the repair. A two-stage check ensures that the data blocks still meet the MDS and repair MDS attributes after the repair.

[0090] Optionally, the repair scheme can achieve monolithic repair parallelization. NCBlob implements a group-based parallel repair scheme based on the concept of partially parallel repair, decomposing the repair operation of non-systematic MSR encoding into parallel partial sub-operations.

[0091] Figure 4 is a schematic diagram of the grouped parallel repair provided in an embodiment of this application. As shown in the left figure of Figure 4, a (5,3) NCBlob has five nodes {N1, N2, N3, N4, N5}, each node storing a block consisting of n-k=2 sub-blocks. SPR first divides each sub-block into n=5 slices, and then collects each k(n-k)=6 slices from all n-k=2 sub-blocks of k=3 nodes in a cyclic manner, forming a group Gi (1≤i≤n) (for example, group G1 contains 6 slices from all 6 sub-blocks of the 3 nodes {N1, N2, N3}). Decoding each group can obtain the original data at the same offset in k=3 data blocks (for example, G1 can decode the first slice of each of the six sub-blocks), because the contents at the same offset in a stripe will be encoded or decoded together at the byte level. In this way, all five nodes (each providing six slices from three groups) can be used evenly in parallel reading.

[0092] Monolithic repair parallelization. NCBlob implements a group-based parallel repair scheme based on the concept of partially parallel repair, decomposing the repair operation of non-systematic MSR encoding into parallel partial sub-operations.

[0093] Figure 4 (right) illustrates a (7,4) NCBlob using GPR. Repairing the missing block can be broken down into repairing three sub-blocks p′1,1, p′1,2, and p′1,3. First, all six helpers are divided into two groups. For each group, a collector is selected to collect the chosen sub-blocks for repair (e.g., N5 and N7 are collectors). Then, helpers responsible for decoding the missing sub-blocks to be functionally repaired are selected. For example, N5 collects {p2,1,p3,1,p4,1,p5,1}, which can provide linear combinations of the repaired sub-blocks (i.e., ai, 2p2,1+ai, 3p3,1+ai, 4p4,1+ai, 5p5,1), and N7 collects {p6,1,p7,1}, which can provide ai, 6p6,1+ai, 7p7,1. Finally, the required sub-blocks p′1,1,p′1,2, and p′1,3 are sent to the requester R to repair the missing blocks, where p′1,i = ai,2p2,1 + ai,3p3,1 + ... + ai,7p7,1. In this way, we can use N5 and N7 to transmit a partially linear combination of each sub-block, thus ultimately allowing the parallel repair of the three sub-blocks p′1,1,p′1,2, and p′1,3.

[0094] In some embodiments, after step 110, the method further includes:

[0095] Step 111: When reading small blobs, optimize the reading path based on locality analysis;

[0096] Step 112: When reading a large blob, the data block is accessed directly through systematic MSR encoding.

[0097] When reading blobs, the read path optimization for small blobs is based on locality analysis, directly accessing relevant data blocks and reducing the reading of redundant data; for large blobs, systematic coding is used to directly access data blocks, ensuring efficient read performance.

[0098] In some embodiments, sub-block fragmentation is used for parallel decoding when reading a Blob.

[0099] When reading a Blob, a sub-block fragmentation technique is used for parallel decoding to balance node load and improve read performance.

[0100] Figure 5 is a schematic diagram of the binary large object storage repair device based on hybrid encoding provided in an embodiment of this application. As shown in Figure 5, the device includes an encoding module 510, a storage module 520, and a repair module 530, wherein:

[0101] Encoding module 510 is used to obtain multiple binary large object blobs, and to perform non-systematic minimum storage regeneration (MSR) encoding on the small blobs among the multiple blobs, and to perform systematic MSR encoding on the large blobs among the multiple blobs.

[0102] Storage module 520 is used to evenly store multiple encoded blobs across multiple storage nodes;

[0103] Repair module 530 is used to determine the corresponding repair strategy based on the type of the Blob to be repaired, and to repair the Blob to be repaired using grouped parallel repair technology based on the repair strategy and the encoded Blobs in multiple storage nodes.

[0104] It should be understood that the above-described device is used to execute the methods in the above embodiments. The implementation principle and technical effect of the corresponding program modules in the device are similar to those described in the above methods. The working process of the device can be referred to the corresponding process in the above methods, and will not be repeated here.

[0105] Based on the methods in the above embodiments, Figure 6 illustrates a schematic diagram of the physical structure of an electronic device. As shown in Figure 6, this application embodiment provides an electronic device that may include: a processor 610, a communication interface 620, a memory 630, and a communication bus 640. The processor 610, communication interface 620, and memory 630 communicate with each other via the communication bus 640. The processor 610 can call logical instructions in the memory 630 to execute the binary large object storage repair method based on hybrid encoding in the above embodiments.

[0106] Furthermore, the logical instructions in the aforementioned memory 630 can be implemented as software functional units and, when sold or used as independent products, can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or a portion of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the binary large object storage repair method based on hybrid encoding described in the various embodiments of this application.

[0107] Based on the methods in the above embodiments, this application provides a computer-readable storage medium storing a computer program. When the computer program runs on a processor, it causes the processor to execute the binary large object storage repair method based on hybrid encoding in the above embodiments.

[0108] Based on the methods in the above embodiments, this application provides a computer program product that, when running on a processor, causes the processor to execute the binary large object storage repair method based on hybrid encoding in the above embodiments.

[0109] It is understood that the processor in the embodiments of this application can be a central processing unit (CPU), or other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, transistor logic devices, hardware components, or any combination thereof. A general-purpose processor can be a microprocessor or any conventional processor.

[0110] The method steps in this application embodiment can be implemented in hardware or by a processor executing software instructions. The software instructions can consist of corresponding software modules, which can be stored in random access memory (RAM), flash memory, read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), registers, hard disks, portable hard disks, CD-ROMs, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor, enabling the processor to read information from and write information to the storage medium. Of course, the storage medium can also be a component of the processor. The processor and the storage medium can reside in an ASIC.

[0111] In the above embodiments, implementation can be achieved entirely or partially through software, hardware, firmware, or any combination thereof. When implemented using software, it can be implemented entirely or partially as a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of this application are generated. The computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions can be stored in a computer-readable storage medium or transmitted through the computer-readable storage medium. The computer instructions can be transmitted from one website, computer, server, or data center to another website, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium can be any available medium that a computer can access or a data storage device such as a server or data center that integrates one or more available media. The available medium can be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid-state disk (SSD)).

[0112] It is understood that the various numerical designations used in the embodiments of this application are merely for the convenience of description and are not intended to limit the scope of the embodiments of this application.

[0113] Those skilled in the art will readily understand that the above description is merely a preferred embodiment of this application and is not intended to limit this application. Any modifications, equivalent substitutions, and improvements made within the spirit and principles of this application should be included within the scope of protection of this application.

Claims

1. A hybrid encoding based binary large object storage repair method, characterized in that, include: Retrieve multiple binary large object blobs, perform non-systematic minimal storage regeneration MSR encoding on the smaller blobs among the multiple blobs, and perform systematic MSR encoding on the larger blobs among the multiple blobs. The encoded blobs are evenly distributed across multiple storage nodes; Based on the type of the Blob to be repaired, a corresponding repair strategy is determined, and based on the repair strategy and the encoded Blobs in the multiple storage nodes, a grouped parallel repair technique is used to repair the Blob to be repaired.

2. The mixed encoding based binary large object storage repair method of claim 1, wherein, The non-systematic Minimal Storage Regeneration (MSR) encoding of small blobs among multiple blobs includes: For small blocks with inter-block locality within a small blob, a merge-split-encode scheme is used for MSR encoding; For small blocks with intra-block locality in small blobs, a split-merge-encode scheme is used for MSR encoding.

3. The mixed encoding based binary large object storage repair method of claim 1, wherein, The systematic MSR encoding of large blobs among multiple blobs includes: The large blob is divided into multiple blocks of fixed size; The multiple fixed-size blocks are systematically MSR encoded.

4. The mixed encoding based binary large object storage repair method of claim 1, wherein, The process of determining the corresponding repair strategy based on the type of the Blob to be repaired includes: If the Blob to be repaired is a small Blob, the repair strategy is determined to be a rotational selection strategy, which selects data from the least used sub-blocks for repair. If the blob to be repaired is a large blob, the repair strategy is to select the least cost path for repair based on the repair rules of the systematic MSR encoding.

5. The mixed encoding based binary large object storage repair method of claim 1, wherein, After the encoded blobs are evenly stored across multiple storage nodes, the method further includes: When reading small blobs, the reading path is optimized based on locality analysis; When reading large blobs, data blocks are accessed directly through systematic MSR encoding.

6. The mixed encoding based binary large object storage repair method of claim 5, wherein, When reading a Blob, a sub-block fragmentation technique is used for parallel decoding.

7. A hybrid coding based binary large object storage repair apparatus, characterized by, include: The encoding module is used to obtain multiple binary large object blobs, and to perform non-systematic minimum storage regeneration (MSR) encoding on the smaller blobs among the multiple blobs, and systematic MSR encoding on the larger blobs among the multiple blobs. The storage module is used to evenly store the encoded blobs across multiple storage nodes; The repair module is used to determine the corresponding repair strategy based on the type of the Blob to be repaired, and to repair the Blob to be repaired using a grouped parallel repair technique based on the repair strategy and the encoded Blobs in the multiple storage nodes.

8. An electronic device, comprising: include: At least one memory for storing computer programs; At least one processor is configured to execute a program stored in the memory, wherein when the program stored in the memory is executed, the processor is configured to perform the binary large object storage repair method based on hybrid encoding as described in any one of claims 1-6.

9. A computer-readable storage medium storing a computer program, the computer program comprising instructions that, when executed by a computer, cause the computer to perform the method of any one of claims 1 to 8. When the computer program is run on the processor, the processor performs the binary large object storage repair method based on hybrid encoding as described in any one of claims 1-6.

10. A computer program product, characterised in that, When the computer program product is run on a processor, it causes the processor to perform the hybrid encoding based binary large object storage repair method as claimed in any one of claims 1-6.