Intelligent hash data layout method, cluster storage system and cluster storage method

A data layout and hashing technology, applied in electrical digital data processing, special data processing applications, instruments, etc., can solve performance bottlenecks, single point of failure, data consistency, data needs to be migrated and redistributed, affecting system performance and expansion To avoid performance bottlenecks, eliminate dependencies, and improve scalability

Active Publication Date: 2013-01-02
中关村科技租赁股份有限公司
View PDF4 Cites 44 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Centralized or distributed metadata management has a series of related issues such as performance bottlenecks, single point of failure, and data consistency, which directly affect system performance and scalability
The data layout mainly adopts stripe (Stripe), mirror (Mirror), hash (Hash), consistent hash (DHT) and other methods. The common problem with them is that the scalability is not high. When the cluster scale expands, a large number of Data needs to be migrated and redistributed
This method relies on the metadata server, and there are a series of related problems such as performance bottlenecks, single point failures, and data consistency in centralized or distributed metadata management, which directly affect system performance and scalability.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Intelligent hash data layout method, cluster storage system and cluster storage method
  • Intelligent hash data layout method, cluster storage system and cluster storage method
  • Intelligent hash data layout method, cluster storage system and cluster storage method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0042] The embodiment of the present invention discloses an intelligent hash data layout method, which is used to lay out storage nodes in a data volume. The data is hash distributed with the directory as the basic unit, and the parent directory of the file uses the extended attribute to record the storage node. Mapping information, child files are distributed in the storage nodes to which the parent directory belongs.

[0043] The intelligent hash data layout method described in this embodiment adopts hash data distribution as the basic distribution algorithm, and further intelligently optimizes the problems of scalability, data migration and redistribution, and unbalanced distribution. The intelligent hash data layout method described in this embodiment records the distribution information of data through the directory extension attribute, so that the newly added nodes will not affect the existing file data distribution, and only participate in the data distribution under the...

Embodiment 2

[0060] An embodiment of the present invention provides a cluster storage system, Figure 8 is a schematic structural diagram of the cluster storage system described in this embodiment, such as Figure 8 As shown, the data storage system includes a storage client and a storage server cluster.

[0061] Each storage server cluster corresponds to a data volume, and each data volume adopts a data layout method to layout the storage nodes in the data volume. Each data volume includes a data layout configuration file, and the data layout configuration file includes a set of storage nodes related to the corresponding data volume, a data layout mode of the corresponding data volume, and an allocation strategy of the corresponding data volume; the data layout mode includes such as implementing The Smart Hash data layout method described in Example 1 or the data is copied and distributed with the directory as the basic unit; the data is distributed with the stripe as the basic unit and ...

Embodiment 3

[0067] The embodiment of the present invention provides a cluster storage method, which is implemented based on a system including a storage client and a storage server cluster, each storage server cluster corresponds to a data volume, and each data volume adopts a data layout method for the storage The storage nodes in the data volume are laid out, including the data storage method and the data layout method; the data layout method includes the intelligent hash data layout method as described in Embodiment 1 or the data is copied and distributed with the directory as the basic unit; the data Take the block as the basic unit to carry out the first-level distribution of stripes and then carry out the second-level distribution of replication.

[0068] Among them, the method of "striping + copying" is: the data takes the directory as the basic unit to carry out the first-level distribution of stripes and then the second-level distribution of replication.

[0069] Wherein, the dat...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an intelligent hash data layout method, a cluster storage system and a cluster storage method. The intelligent hash data layout method is used for laying out storage nodes in a data volume. Data is distributed through a hash distribution method by taking a directory as a basic unit. The parent directory of a file utilizes extended attributes to record the mapping information of the storage nodes. Sub-files are distributed in the storage nodes to which the parent directory belongs. The distribution method of the sub-files in the storage nodes to which the parent directory belongs specifically comprises one of the following situations that: the sub-files are distributed in the storage nodes to which the parent directory belongs through the hash distribution method; the sub-files are distributed in the storage nodes to which the parent directory belongs through a zonal two-level distribution method; the sub-files are distributed in the storage nodes to which the parent directory belongs through a duplicate two-level distribution method; and the sub-files are firstly distributed in the storage nodes to which the parent directory belongs through the zonal two-level distribution method and then are distributed through a duplicate three-level distribution method. The system and the methods provided by the invention have the advantages that the extensibility, the performance, the availability and the applicability of the cluster storage system can be remarkably improved, and the load pressure of a storage server can be greatly decreased.

Description

technical field [0001] The invention relates to the technical field of data storage, in particular to an intelligent hash data layout method, a cluster storage system and a method thereof. Background technique [0002] Under the background of cloud storage and big data, data presents an explosive growth trend. According to research, the digital universe will reach 35.2ZB in 2020, a 44-fold increase from 0.8ZB in 2009, of which more than 80% are unstructured data. A large number of data-intensive applications such as high-performance computing, medical imaging, oil and gas exploration, digital media and social WEB lead to a blowout of data, which constantly poses new and severe challenges to storage methods. Clustered storage is a scale-out storage architecture that has the advantages of linear expansion in capacity and performance, and has been widely recognized by the global market. Cluster storage technology involves two key issues, namely metadata management and data la...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 刘爱贵
Owner 中关村科技租赁股份有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products