Data storage method and terminal based on consistent hash algorithm

A hash algorithm and data storage technology, applied in the direction of electrical digital data processing, digital data information retrieval, special data processing applications, etc., can solve the problems of uneven distribution of nodes, it is difficult to ensure continuous uniformity, etc., and achieve enhanced redundancy security performance, improving usage efficiency, and improving search efficiency

Active Publication Date: 2021-01-15
JINQIANMAO TECH CO LTD
8 Cites 1 Cited by

AI-Extracted Technical Summary

Problems solved by technology

[0003] However, the traditional distributed hash algorithm faces the problem of uneven distribution of nodes, especially after dynamically adding nodes, even if the original distribution is uniform, it is difficult to ensure that it will c...
View more

Method used

As can be seen from the above description, the file to be stored is divided into blocks by erasure code to obtain a file block set, and the file name can be used as a unique identifier, and the file is stored in multiple local disks of the server in an erasure code mode, so that There is redundant data in the file block, even if the data is damaged, the original data can be restored in time, and the data damage can be tolerated to a certain extent, which improves the robustness of the file storage system itself; and, find the first physical disk to store the first After the file blocks, instead of looking for virtual nodes on the hash ring, the rest of the file blocks are directly stored in the physical disk sequence table constructed in advance, and there is no need to perform multiple virtual nodes and physical disks when accessing files. The mapping between files speeds up the efficiency of file access. When adding or subtracting disks, only the data on the changed disks needs to be migrated, which shortens the time for data reconstruction and ensures the availability of the system.
As can be seen from the above description, the file to be stored is divided into blocks by erasure codes to obtain a set of file blocks, and the file name can be used as a unique identifier, and the file is stored in multiple local disks of the server in an erasure code mode, so that There is redundant data in the file block, even if the data is damaged, the original data can be restored in time, and the data damage can be tolerated to a certain extent, which improves the robustness of the file storage system itself; and, find the first physical disk to store the first After the file blocks, instead of looking for virtual nodes on the hash ring, the rest of the file blocks are directly stored in the physical disk sequence table constructed in advance, and there is no need to perform multiple virtual nodes and physical disks when accessing files. The mapping between files speeds up the efficiency of file access. When adding or subtracting disks, only the data on the changed disks needs to be migrated, which shortens the time for data reconstruction and ensures the availability of the system.
As can be seen from the foregoing description, the beneficial effects of the present invention are: a physical disk is mapped to a plurality of virtual nodes, and the virtual nodes are identified by hash values, and when storing files, the corresponding hash values ​​of filenames are used to identify files, Determine the corresponding virtual node according to the hash value corresponding to the file name, and store the file on the physical disk corresponding to the virtual node. Because a single physical disk is mapped to multiple virtual nodes, the number of nodes on the hash ring is increased, making the hash ring The distance between nodes on the Greek ring is more even, and the probability of the hash value calculated by the file name falling on each virtual node is more even, which slows down the overheating of a single physical disk to a certain extent. By using the file name as the key value Marking t...
View more

Abstract

The invention provides a data storage method and terminal based on a consistent hash algorithm, and the method comprises the steps: mapping a physical disk into more than one virtual node, and calculating a first hash value of each virtual node; arranging all virtual nodes according to the size sequence of first hash values to form a hash ring; receiving a file storage request, wherein the file storage request comprises a to-be-stored file and a file name; calculating a second hash value corresponding to the filename, finding a first hash value closest to the second hash value on the hash ringaccording to a preset direction, and storing the to-be-stored file corresponding to the filename to a physical disk where a virtual node corresponding to the first hash value closest to the second hash value is located. The single physical disk is mapped into the plurality of virtual nodes, and the file name is used as the key value to identify the to-be-stored file, so that the file searching efficiency is improved.

Application Domain

Input/output to record carriersFile access structures +2

Technology Topic

Magnetic disksAlgorithm +4

Image

  • Data storage method and terminal based on consistent hash algorithm
  • Data storage method and terminal based on consistent hash algorithm
  • Data storage method and terminal based on consistent hash algorithm

Examples

  • Experimental program(4)

Example Embodiment

[0088]Please refer tofigure 1 andimage 3 , The first embodiment of the present invention is:
[0089]A data storage method based on a consistent hash algorithm, including the steps:
[0090]S1. Construct a physical disk sequence table, and map each physical disk in the physical disk sequence table to more than one virtual node;
[0091]S2. Arrange all the virtual nodes corresponding to all the physical disks in the physical disk sequence table in the order of the size of the first hash value to form a hash ring;
[0092]In an optional implementation manner, the hash ring, that is, the topological logical structure of each virtual node, is a chord ring;
[0093]S3. Receive a file storage request, where the file storage request includes the file to be stored and the file name, and the file name is used as the storage key (key value);
[0094]S4. Calculate the second hash value corresponding to the file name, find the first hash value closest to the second hash value in a preset direction on the hash ring, and convert the file name The corresponding file to be stored is stored on the physical disk where the virtual node corresponding to the first hash value closest to the second hash value is located. At this time, the file to be stored is changed to stored file;
[0095]Please refer toFigure 4 In an optional implementation manner, all the virtual nodes are arranged in the order of increasing the first hash value in a clockwise order to form a hash ring, and the second hash value corresponding to the file name is calculated. Clockwise find the first hash value closest to the second hash value, and store the file to be stored corresponding to the file name in the first hash value closest to the second hash value The corresponding virtual node is located on the physical disk; if the second hash value is calculated as 52, the file to be stored is stored on the physical disk corresponding to the virtual node with the hash value of 80;
[0096]Wherein, storing the file corresponding to the file name on the physical disk where the virtual node corresponding to the first hash value is located specifically includes: obtaining the disk name of the physical disk; and taking the second hash value Modulus to obtain a first identification; take a modulo of the first identification to obtain a second identification; generate a file storage path according to the disk name, the first identification, and the second identification; according to the file storage path The file to be stored corresponding to the file name is stored on the physical disk where the virtual node corresponding to the first hash value is located; for example, a file storage path is dataX/first/second/file name, and dataX is the physical disk Name, the first directory name is obtained by taking the second hash value calculated from the file name modulo 256 and converting it into a hexadecimal number; the second directory is the first directory name taking the modulo 256 again and converting it into hexadecimal Obtained by the system;
[0097]S4 is specifically:
[0098]S41. Calculate a second hash value corresponding to the file name using a preset hash algorithm, and both the second hash value and the first hash value are calculated using the preset hash algorithm;
[0099]S42. Use erasure codes to block the file to be stored to obtain a file block set, where the file block set includes a plurality of file blocks arranged in sequence;
[0100]In an optional implementation manner, the RS erasure code is used to block the file to be stored to obtain k+m data blocks, where k is the number of original data blocks and m is the number of check blocks;
[0101]S43. Find the first hash value closest to the second hash value in the preset direction on the hash ring, and store the first file block in the file block set corresponding to the file name in On the first physical disk where the virtual node corresponding to the first hash value closest to the second hash value is located;
[0102]S44. Obtain the position of the first physical disk in the physical disk sequence table, and store the remaining N file blocks except for the first file block in the sequence in the physical disk sequence table. On N physical disks after one physical disk.

Example Embodiment

[0103]Please refer toFigure 7 , The second embodiment of the present invention is:
[0104]A data storage method based on a consistent hash algorithm, which differs from the first embodiment in:
[0105]Also includes expansion:
[0106]Adding a third physical disk to the physical disk sequence table, mapping the third physical disk to more than one child node, and calculating the third hash value of each child node;
[0107]Placing each of the child nodes into the hash ring according to the third hash value;
[0108]Acquiring an adjacent virtual node adjacent to the child node, and acquiring an adjacent hash value corresponding to the adjacent virtual node;
[0109]Calculate the second hash value of all the files that have been stored, and store the to-be-stored file with the second hash value in the interval between the third hash value and the neighbor hash value in the third physical disk;
[0110]It also includes deleting physical disks:
[0111]Delete the fourth physical disk from the physical disk sequence table, and delete the child node mapped by the fourth physical disk from the hash ring to obtain the fourth hash value corresponding to the child node;
[0112]Acquiring a first neighboring virtual node and a second neighboring virtual node adjacent to the child node, and acquiring a first neighboring hash value corresponding to the first neighboring virtual node;
[0113]Calculate the second hash value of all the files that have been stored, and store the to-be-stored file with the second hash value in the interval between the first neighbor hash value and the fourth hash value in the second neighbor virtual node;
[0114]Please refer toFigure 7 For example, when adding a new disk, assuming that the newly added disk is labeled X, first update the new disk X to the physical disk sequence table, and then allocate new child nodes X0, X1...X10 for disk X, and generate these children The hash value corresponding to the node is updated to the chord ring; the newly written data after adding the disk is processed according to the previous method, while for the previously stored data, data reconstruction is required.
[0115]Please refer toFigure 7 , The hash value of the newly added disk X child node is 90, 1000 and 5000, because the second hash value determined by the file name on the chord ring (hash ring) is clockwise to determine the closest virtual The first hash value of the node (child node), the hash value interval corresponding to the file name that needs to be reconstructed is [80-90], [500-1000]...[3000-5000];
[0116]For the files distributed in the above range to be reconstructed, the initial storage node is updated to disk X, and then the remaining k+m-1 disks are found in the physical disk relationship table based on disk X, and finally the file data is erasure coded The encoding method is stored in these k+m disks;
[0117]Please refer toFigure 7 If disk X is reduced, the chord ring is updated first, and the hash value corresponding to the subnode of the disk is deleted from the chord ring. To reduce the data stored before disk X, data reconstruction is required, and data reconstruction is required The range of the hash value of the file name is [80-90], [500-1000]...[3000-5000];
[0118]Take the file reconstruction whose hash value of the file name is distributed in the interval [80-90] as an example, store the file whose hash value corresponding to the file name is in this interval to the disk corresponding to the virtual node with the hash value of 120 On A, update the initial storage node of the file to disk A, and then find the remaining k+m-1 disks in the physical disk relationship table based on disk A, and finally store the file data in this k+ according to erasure coding. m disks;
[0119]In an optional implementation manner, the version number is used to identify the mapping relationship table between the physical disk and the child nodes, and the latest version of the mapping relationship table is preferentially accessed. If the data cannot be obtained, the previous version of the mapping relationship table is used ; The mapping relationship table is associated with the physical disk sequence table, if the physical disk sequence table change signal is obtained, the corresponding update is performed and the version number is updated;
[0120]Specifically, in order to ensure the availability of the disk during the data reconstruction process, multiple versions are established for the correspondence between the physical disk and its child nodes; when the client reads the file, the latest version of the physical disk and its child node table are first accessed. If the data is queried through the table, the correct data is returned; if the data is not obtained, it means that the data is still in the reconstruction process, and the old version of the physical disk and its child node table are used to return the actual data of the file. At the same time, during data reconstruction, when re-storing the stored files whose second hash value is in the interval between the third hash value and the neighbor hash value, the list of disks that need to be stored after the data reconstruction is compared with the previous data. In the storage list, only the changed disks are re-stored, and the unchanged ones are kept as they are; in this way, the time for data reconstruction is shortened.

Example Embodiment

[0121]Please refer toFigure 3 to Figure 6, The third embodiment of the present invention is:
[0122]Apply the above-mentioned data storage method based on consistent hashing algorithm to actual scenarios:
[0123](1) such asimage 3 As shown, a total of N disks in the local disk sequence table A, B...N are constructed;
[0124](2) Map each disk into several disk subnodes (virtual nodes), such as 10 virtual nodes, the virtual nodes of disk A are A1, A2...A10, and the virtual nodes of disk B are B1, B2...B10 , The virtual nodes of disk N are N1, N2...N10; calculate the hash value of all the above virtual nodes, and place each virtual node on a consistent hash table such as chord ring according to its hash value. When data query is performed, Store the data on the physical node (disk) corresponding to the virtual node. For example, the data query storage location is AX on disk A;
[0125](3) When storing a file, receiving the client's write data request, first use the file name as the key value, use the same hash algorithm as the establishment of the chord ring to calculate the hash value corresponding to the file name and determine that the file name is in the chord ring Assuming that the file is named F1, its corresponding hash value is 980, and it “walks” clockwise along the chord ring, and the first node encountered is 1100, then the first virtual section to be stored in the file Dot is determined as B1;
[0126](4) Through the correspondence between physical disks and virtual nodes, virtual node B1 corresponds to disk B, then disk B is the first storage node selected for the file;
[0127](5) Carry out RS erasure code encoding on the file to get m chunks (file blocks) and k redundant blocks; please refer toFigure 5 , Where m=4, k=2, the original file data block is D, D can be regarded as a vector composed of 4 small data blocks D1, D2, D3, and D4; matrix B is a Vandermonde matrix, After matrix multiplication, 6 data blocks D1, D2, D3, D4, C1 and C2 are obtained, of which C1 and C2 data blocks are redundant blocks; according to the erasure code rules, any 4 of these 6 data blocks can be obtained Get the original file data block D. To obtain the physical disk sequence table A to Z, starting from disk B, select 6 physical disks B, C, D, E, F, and G to store the file F1;
[0128](6) The storage path of the file F1 on the disk is stored in the format of /dataX/first/second/file name: The first calculation method is that the hash value of F1 is 980, 980 is modulo 256 to get 3, and the remainder is 212 , Calculate their hexadecimal numbers, first is 3, second is d4, then the storage path of file F1 on the disk is: /dataB/3/d4/F1, /dataC/3/d4/F1.../dataG /3/d4/F1;
[0129](7) Store the k+m blocks on the corresponding physical disk according to the path determined in (6).

PUM

no PUM

Description & Claims & Application Information

We can also present the details of the Description, Claims and Application information to help users get a comprehensive understanding of the technical details of the patent, such as background art, summary of invention, brief description of drawings, description of embodiments, and other original content. On the other hand, users can also determine the specific scope of protection of the technology through the list of claims; as well as understand the changes in the life cycle of the technology with the presentation of the patent timeline. Login to view more.

Similar technology patents

Systems and methods for notifying multiple hosts from an industrial controller

ActiveUS8150959B1reusable block of code very difficultimprove efficiency
Owner:ROCKWELL AUTOMATION TECH

MIMO-OFDM transmitter

InactiveUS20070253504A1improve efficiencyreduce time
Owner:FUJITSU LTD

Systems and methods for providing treatment planning

InactiveUS20050182654A1high quality of careimprove efficiency
Owner:ALIGN TECH

Classification and recommendation of technical efficacy words

  • Improve efficiency
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products